886 resultados para Optimal test set


Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the UPM system for translation task at the EMNLP 2011 workshop on statistical machine translation (http://www.statmt.org/wmt11/), and it has been used for both directions: Spanish-English and English-Spanish. This system is based on Moses with two new modules for pre and post processing the sentences. The main contribution is the method proposed (based on the similarity with the source language test set) for selecting the sentences for training the models and adjusting the weights. With system, we have obtained a 23.2 BLEU for Spanish-English and 21.7 BLEU for EnglishSpanish

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Tanto los robots autónomos móviles como los robots móviles remotamente operados se utilizan con éxito actualmente en un gran número de ámbitos, algunos de los cuales son tan dispares como la limpieza en el hogar, movimiento de productos en almacenes o la exploración espacial. Sin embargo, es difícil garantizar la ausencia de defectos en los programas que controlan dichos dispositivos, al igual que ocurre en otros sectores informáticos. Existen diferentes alternativas para medir la calidad de un sistema en el desempeño de las funciones para las que fue diseñado, siendo una de ellas la fiabilidad. En el caso de la mayoría de los sistemas físicos se detecta una degradación en la fiabilidad a medida que el sistema envejece. Esto es debido generalmente a efectos de desgaste. En el caso de los sistemas software esto no suele ocurrir, ya que los defectos que existen en ellos generalmente no han sido adquiridos con el paso del tiempo, sino que han sido insertados en el proceso de desarrollo de los mismos. Si dentro del proceso de generación de un sistema software se focaliza la atención en la etapa de codificación, podría plantearse un estudio que tratara de determinar la fiabilidad de distintos algoritmos, válidos para desempeñar el mismo cometido, según los posibles defectos que pudieran introducir los programadores. Este estudio básico podría tener diferentes aplicaciones, como por ejemplo elegir el algoritmo menos sensible a los defectos, para el desarrollo de un sistema crítico o establecer procedimientos de verificación y validación, más exigentes, si existe la necesidad de utilizar un algoritmo que tenga una alta sensibilidad a los defectos. En el presente trabajo de investigación se ha estudiado la influencia que tienen determinados tipos de defectos software en la fiabilidad de tres controladores de velocidad multivariable (PID, Fuzzy y LQR) al actuar en un robot móvil específico. La hipótesis planteada es que los controladores estudiados ofrecen distinta fiabilidad al verse afectados por similares patrones de defectos, lo cual ha sido confirmado por los resultados obtenidos. Desde el punto de vista de la planificación experimental, en primer lugar se realizaron los ensayos necesarios para determinar si los controladores de una misma familia (PID, Fuzzy o LQR) ofrecían una fiabilidad similar, bajo las mismas condiciones experimentales. Una vez confirmado este extremo, se eligió de forma aleatoria un representante de clase de cada familia de controladores, para efectuar una batería de pruebas más exhaustiva, con el objeto de obtener datos que permitieran comparar de una forma más completa la fiabilidad de los controladores bajo estudio. Ante la imposibilidad de realizar un elevado número de pruebas con un robot real, así como para evitar daños en un dispositivo que generalmente tiene un coste significativo, ha sido necesario construir un simulador multicomputador del robot. Dicho simulador ha sido utilizado tanto en las actividades de obtención de controladores bien ajustados, como en la realización de los diferentes ensayos necesarios para el experimento de fiabilidad. ABSTRACT Autonomous mobile robots and remotely operated robots are used successfully in very diverse scenarios, such as home cleaning, movement of goods in warehouses or space exploration. However, it is difficult to ensure the absence of defects in programs controlling these devices, as it happens in most computer sectors. There exist different quality measures of a system when performing the functions for which it was designed, among them, reliability. For most physical systems, a degradation occurs as the system ages. This is generally due to the wear effect. In software systems, this does not usually happen, and defects often come from system development and not from use. Let us assume that we focus on the coding stage in the software development pro¬cess. We could consider a study to find out the reliability of different and equally valid algorithms, taking into account any flaws that programmers may introduce. This basic study may have several applications, such as choosing the algorithm less sensitive to pro¬gramming defects for the development of a critical system. We could also establish more demanding procedures for verification and validation if we need an algorithm with high sensitivity to programming defects. In this thesis, we studied the influence of certain types of software defects in the reliability of three multivariable speed controllers (PID, Fuzzy and LQR) designed to work in a specific mobile robot. The hypothesis is that similar defect patterns affect differently the reliability of controllers, and it has been confirmed by the results. From the viewpoint of experimental planning, we followed these steps. First, we conducted the necessary test to determine if controllers of the same family (PID, Fuzzy or LQR) offered a similar reliability under the same experimental conditions. Then, a class representative was chosen at ramdom within each controller family to perform a more comprehensive test set, with the purpose of getting data to compare more extensively the reliability of the controllers under study. The impossibility of performing a large number of tests with a real robot and the need to prevent the damage of a device with a significant cost, lead us to construct a multicomputer robot simulator. This simulator has been used to obtain well adjusted controllers and to carry out the required reliability experiments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes the UPM system for the Spanish-English translation task at the NAACL 2012 workshop on statistical machine translation. This system is based on Moses. We have used all available free corpora, cleaning and deleting some repetitions. In this paper, we also propose a technique for selecting the sentences for tuning the system. This technique is based on the similarity with the sentences to translate. With our approach, we improve the BLEU score from 28.37% to 28.57%. And as a result of the WMT12 challenge we have obtained a 31.80% BLEU with the 2012 test set. Finally, we explain different experiments that we have carried out after the competition.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Several methods to improve multiple distant microphone (MDM) speaker diarization based on Time Delay of Arrival (TDOA) features are evaluated in this paper. All of them avoid the use of a single reference channel to calculate the TDOA values and, based on different criteria, select among all possible pairs of microphones a set of pairs that will be used to estimate the TDOA's. The evaluated methods have been named the "Dynamic Margin" (DM), the "Extreme Regions" (ER), the "Most Common" (MC), the "Cross Correlation" (XCorr) and the "Principle Component Analysis" (PCA). It is shown that all methods improve the baseline results for the development set and four of them improve also the results for the evaluation set. Improvements of 3.49% and 10.77% DER relative are obtained for DM and ER respectively for the test set. The XCorr and PCA methods achieve an improvement of 36.72% and 30.82% DER relative for the test set. Moreover, the computational cost for the XCorr method is 20% less than the baseline.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Flash floods are of major relevance in natural disaster management in the Mediterranean region. In many cases, the damaging effects of flash floods can be mitigated by adequate management of flood control reservoirs. This requires the development of suitable models for optimal operation of reservoirs. A probabilistic methodology for calibrating the parameters of a reservoir flood control model (RFCM) that takes into account the stochastic variability of flood events is presented. This study addresses the crucial problem of operating reservoirs during flood events, considering downstream river damages and dam failure risk as conflicting operation criteria. These two criteria are aggregated into a single objective of total expected damages from both the maximum released flows and stored volumes (overall risk index). For each selected parameter set the RFCM is run under a wide range of hydrologic loads (determined through Monte Carlo simulation). The optimal parameter set is obtained through the overall risk index (balanced solution) and then compared with other solutions of the Pareto front. The proposed methodology is implemented at three different reservoirs in the southeast of Spain. The results obtained show that the balanced solution offers a good compromise between the two main objectives of reservoir flood control management

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El objetivo de esta tesis es la caracterización de la generación térmica representativa de la existente en la realidad, para posteriormente proceder a su modelización y simulación integrándolas en una red eléctrica tipo y llevar a cabo estudios de optimización multiobjetivo económico medioambiental. Para ello, en primera instancia se analiza el contexto energético y eléctrico actual, y más concretamente el peninsular, en el que habiendo desaparecido las centrales de fuelóleo, sólo quedan ciclos combinados y centrales de carbón de distinto rango. Seguidamente se lleva a cabo un análisis de los principales impactos medioambientales de las centrales eléctricas basadas en combustión, representados sobre todo por sus emisiones de CO2, SO2 y NOx, de las medidas de control y mitigación de las mismas y de la normativa que les aplica. A continuación, a partir de las características de los combustibles y de la información de los consumos específicos, se caracterizan los grupos térmicos frente a las funciones relevantes que definen su comportamiento energético, económico y medioambiental, en términos de funciones de salida horarias dependiendo de la carga. Se tiene en cuenta la posibilidad de desnitrificación y desulfuración. Dado que las funciones objetivo son múltiples, y que están en conflicto unas con otras, se ha optado por usar métodos multiobjetivo que son capaces de identificar el contorno de puntos óptimos o frente de Pareto, en los que tomando una solución no existe otra que lo mejore en alguna de las funciones objetivo sin empeorarlo en otra. Se analizaron varios métodos de optimización multiobjetivo y se seleccionó el de las ε constraint, capaz de encontrar frentes no convexos y cuya optimalidad estricta se puede comprobar. Se integró una representación equilibrada de centrales de antracita, hulla nacional e importada, lignito y ciclos combinados en la red tipo IEEE-57, en la que se puede trabajar con siete centrales sin distorsionar demasiado las potencias nominales reales de los grupos, y se programó en Matlab la resolución de flujos óptimos de carga en alterna con el método multiobjetivo integrado. Se identifican los frentes de Pareto de las combinaciones de coste y cada uno de los tres tipos de emisión, y también el de los cuatro objetivos juntos, obteniendo los resultados de costes óptimos del sistema para todo el rango de emisiones. Se valora cuánto le cuesta al sistema reducir una tonelada adicional de cualquier tipo de emisión a base de desplazarse a combinaciones de generación más limpias. Los puntos encontrados aseguran que bajo unas determinadas emisiones no pueden ser mejorados económicamente, o que atendiendo a ese coste no se puede reducir más allá el sistema en lo relativo a emisiones. También se indica cómo usar los frentes de Pareto para trazar estrategias óptimas de producción ante cambios horarios de carga. ABSTRACT The aim of this thesis is the characterization of electrical generation based on combustion processes representative of the actual power plants, for the latter modelling and simulation of an electrical grid and the development of economic- environmental multiobjective optimization studies. In this line, the first step taken is the analysis of the current energetic and electrical framework, focused on the peninsular one, where the fuel power plants have been shut down, and the only ones remaining are coal units of different types and combined cycle. Then it is carried out an analysis of the main environmental impacts of the thermal power plants, represented basically by the emissions of CO2, SO2 y NOx, their control and reduction measures and the applicable regulations. Next, based on the combustibles properties and the information about the units heat rates, the different power plants are characterized in relation to the outstanding functions that define their energy, economic and environmental behaviour, in terms of hourly output functions depending on their load. Optional denitrification and desulfurization is considered. Given that there are multiple objectives, and that they go in conflictive directions, it has been decided the use of multiobjective techniques, that have the ability of identifying the optimal points set, which is called the Pareto front, where taken a solution there will be no other point that can beat the former in an objective without worsening it in another objective. Several multiobjective optimization methods were analysed and pondered, selecting the ε constraint technique, which is able to find no convex fronts and it is opened to be tested to prove the strict Pareto optimality of the obtained solutions. A balanced representation of the thermal power plants, formed by anthracite, lignite, bituminous national and imported coals and combined cycle, was integrated in the IEEE-57 network case. This system was selected because it deals with a total power that will admit seven units without distorting significantly the actual size of the power plants. Next, an AC optimal power flow with the multiobjective method implemented in the routines was programmed. The Pareto fronts of the combination of operative costs with each of the three emissions functions were found, and also the front of all of them together. The optimal production costs of the system for all the emissions range were obtained. It is also evaluated the cost of reducing an additional emission ton of any of the emissions when the optimal production mix is displaced towards cleaner points. The obtained solutions assure that under a determined level of emissions they cannot be improved economically or, in the other way, at a determined cost it cannot be found points of lesser emissions. The Pareto fronts are also applied for the search of optimal strategic paths to follow the hourly load changes.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Diferentes abordagens teóricas têm sido utilizadas em estudos de sistemas biomoleculares com o objetivo de contribuir com o tratamento de diversas doenças. Para a dor neuropática, por exemplo, o estudo de compostos que interagem com o receptor sigma-1 (Sig-1R) pode elucidar os principais fatores associados à atividade biológica dos mesmos. Nesse propósito, estudos de Relações Quantitativas Estrutura-Atividade (QSAR) utilizando os métodos de regressão por Mínimos Quadrados Parciais (PLS) e Rede Neural Artificial (ANN) foram aplicados a 64 antagonistas do Sig-1R pertencentes à classe de 1-arilpirazóis. Modelos PLS e ANN foram utilizados com o objetivo de descrever comportamentos lineares e não lineares, respectivamente, entre um conjunto de descritores e a atividade biológica dos compostos selecionados. O modelo PLS foi obtido com 51 compostos no conjunto treinamento e 13 compostos no conjunto teste (r² = 0,768, q² = 0,684 e r²teste = 0,785). Testes de leave-N-out, randomização da atividade biológica e detecção de outliers confirmaram a robustez e estabilidade dos modelos e mostraram que os mesmos não foram obtidos por correlações ao acaso. Modelos também foram gerados a partir da Rede Neural Artificial Perceptron de Multicamadas (MLP-ANN), sendo que a arquitetura 6-12-1, treinada com as funções de transferência tansig-tansig, apresentou a melhor resposta para a predição da atividade biológica dos compostos (r²treinamento = 0,891, r²validação = 0,852 e r²teste = 0,793). Outra abordagem foi utilizada para simular o ambiente de membranas sinápticas utilizando bicamadas lipídicas compostas por POPC, DOPE, POPS e colesterol. Os estudos de dinâmica molecular desenvolvidos mostraram que altas concentrações de colesterol induzem redução da área por lipídeo e difusão lateral e aumento na espessura da membrana e nos valores de parâmetro de ordem causados pelo ordenamento das cadeias acil dos fosfolipídeos. As bicamadas lipídicas obtidas podem ser usadas para simular interações entre lipídeos e pequenas moléculas ou proteínas contribuindo para as pesquisas associadas a doenças como Alzheimer e Parkinson. As abordagens usadas nessa tese são essenciais para o desenvolvimento de novas pesquisas em Química Medicinal Computacional.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In the present study, multilayer perceptron (MLP) neural networks were applied to help in the diagnosis of obstructive sleep apnoea syndrome (OSAS). Oxygen saturation (SaO2) recordings from nocturnal pulse oximetry were used for this purpose. We performed time and spectral analysis of these signals to extract 14 features related to OSAS. The performance of two different MLP classifiers was compared: maximum likelihood (ML) and Bayesian (BY) MLP networks. A total of 187 subjects suspected of suffering from OSAS took part in the study. Their SaO2 signals were divided into a training set with 74 recordings and a test set with 113 recordings. BY-MLP networks achieved the best performance on the test set with 85.58% accuracy (87.76% sensitivity and 82.39% specificity). These results were substantially better than those provided by ML-MLP networks, which were affected by overfitting and achieved an accuracy of 76.81% (86.42% sensitivity and 62.83% specificity). Our results suggest that the Bayesian framework is preferred to implement our MLP classifiers. The proposed BY-MLP networks could be used for early OSAS detection. They could contribute to overcome the difficulties of nocturnal polysomnography (PSG) and thus reduce the demand for these studies.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Urinary proteomics is emerging as a powerful non-invasive tool for diagnosis and monitoring of variety of human diseases. We tested whether signatures of urinary polypeptides can contribute to the existing biomarkers for coronary artery disease (CAD). We examined a total of 359 urine samples from 88 patients with severe CAD and 282 controls. Spot urine was analyzed using capillary electrophoresis on-line coupled to ESI-TOF-MS enabling characterization of more than 1000 polypeptides per sample. In a first step a "training set" for biomarker definition was created. Multiple biomarker patterns clearly distinguished healthy controls from CAD patients, and we extracted 15 peptides that define a characteristic CAD signature panel. In a second step, the ability of the CAD-specific panel to predict the presence of CAD was evaluated in a blinded study using a "test set." The signature panel showed sensitivity of 98% (95% confidence interval, 88.7-99.6) and 83% specificity (95% confidence interval, 51.6-97.4). Furthermore the peptide pattern significantly changed toward the healthy signature correlating with the level of physical activity after therapeutic intervention. Our results show that urinary proteomics can identify CAD patients with high confidence and might also play a role in monitoring the effects of therapeutic interventions. The workflow is amenable to clinical routine testing suggesting that non-invasive proteomics analysis can become a valuable addition to other biomarkers used in cardiovascular risk assessment.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Major histocompatibility complex (MHC) II proteins bind peptide fragments derived from pathogen antigens and present them at the cell surface for recognition by T cells. MHC proteins are divided into Class I and Class II. Human MHC Class II alleles are grouped into three loci: HLA-DP, HLA-DQ, and HLA-DR. They are involved in many autoimmune diseases. In contrast to HLA-DR and HLA-DQ proteins, the X-ray structure of the HLA-DP2 protein has been solved quite recently. In this study, we have used structure-based molecular dynamics simulation to derive a tool for rapid and accurate virtual screening for the prediction of HLA-DP2-peptide binding. A combinatorial library of 247 peptides was built using the "single amino acid substitution" approach and docked into the HLA-DP2 binding site. The complexes were simulated for 1 ns and the short range interaction energies (Lennard-Jones and Coulumb) were used as binding scores after normalization. The normalized values were collected into quantitative matrices (QMs) and their predictive abilities were validated on a large external test set. The validation shows that the best performing QM consisted of Lennard-Jones energies normalized over all positions for anchor residues only plus cross terms between anchor-residues.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents some forecasting techniques for energy demand and price prediction, one day ahead. These techniques combine wavelet transform (WT) with fixed and adaptive machine learning/time series models (multi-layer perceptron (MLP), radial basis functions, linear regression, or GARCH). To create an adaptive model, we use an extended Kalman filter or particle filter to update the parameters continuously on the test set. The adaptive GARCH model is a new contribution, broadening the applicability of GARCH methods. We empirically compared two approaches of combining the WT with prediction models: multicomponent forecasts and direct forecasts. These techniques are applied to large sets of real data (both stationary and non-stationary) from the UK energy markets, so as to provide comparative results that are statistically stronger than those previously reported. The results showed that the forecasting accuracy is significantly improved by using the WT and adaptive models. The best models on the electricity demand/gas price forecast are the adaptive MLP/GARCH with the multicomponent forecast; their MSEs are 0.02314 and 0.15384 respectively.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Subunit vaccine discovery is an accepted clinical priority. The empirical approach is time- and labor-consuming and can often end in failure. Rational information-driven approaches can overcome these limitations in a fast and efficient manner. However, informatics solutions require reliable algorithms for antigen identification. All known algorithms use sequence similarity to identify antigens. However, antigenicity may be encoded subtly in a sequence and may not be directly identifiable by sequence alignment. We propose a new alignment-independent method for antigen recognition based on the principal chemical properties of protein amino acid sequences. The method is tested by cross-validation on a training set of bacterial antigens and external validation on a test set of known antigens. The prediction accuracy is 83% for the cross-validation and 80% for the external test set. Our approach is accurate and robust, and provides a potent tool for the in silico discovery of medically relevant subunit vaccines.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Cleavage by the proteasome is responsible for generating the C terminus of T-cell epitopes. Modeling the process of proteasome cleavage as part of a multi-step algorithm for T-cell epitope prediction will reduce the number of non-binders and increase the overall accuracy of the predictive algorithm. Quantitative matrix-based models for prediction of the proteasome cleavage sites in a protein were developed using a training set of 489 naturally processed T-cell epitopes (nonamer peptides) associated with HLA-A and HLA-B molecules. The models were validated using an external test set of 227 T-cell epitopes. The performance of the models was good, identifying 76% of the C-termini correctly. The best model of proteasome cleavage was incorporated as the first step in a three-step algorithm for T-cell epitope prediction, where subsequent steps predicted TAP affinity and MHC binding using previously derived models.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Background: HLA-DPs are class II MHC proteins mediating immune responses to many diseases. Peptides bind MHC class II proteins in the acidic environment within endosomes. Acidic pH markedly elevates association rate constants but dissociation rates are almost unchanged in the pH range 5.0 - 7.0. This pH-driven effect can be explained by the protonation/deprotonation states of Histidine, whose imidazole has a pKa of 6.0. At pH 5.0, imidazole ring is protonated, making Histidine positively charged and very hydrophilic, while at pH 7.0 imidazole is unprotonated, making Histidine less hydrophilic. We develop here a method to predict peptide binding to the four most frequent HLA-DP proteins: DP1, DP41, DP42 and DP5, using a molecular docking protocol. Dockings to virtual combinatorial peptide libraries were performed at pH 5.0 and pH 7.0. Results: The X-ray structure of the peptide - HLA-DP2 protein complex was used as a starting template to model by homology the structure of the four DP proteins. The resulting models were used to produce virtual combinatorial peptide libraries constructed using the single amino acid substitution (SAAS) principle. Peptides were docked into the DP binding site using AutoDock at pH 5.0 and pH 7.0. The resulting scores were normalized and used to generate Docking Score-based Quantitative Matrices (DS-QMs). The predictive ability of these QMs was tested using an external test set of 484 known DP binders. They were also compared to existing servers for DP binding prediction. The models derived at pH 5.0 predict better than those derived at pH 7.0 and showed significantly improved predictions for three of the four DP proteins, when compared to the existing servers. They are able to recognize 50% of the known binders in the top 5% of predicted peptides. Conclusions: The higher predictive ability of DS-QMs derived at pH 5.0 may be rationalised by the additional hydrogen bond formed between the backbone carbonyl oxygen belonging to the peptide position before p1 (p-1) and the protonated ε-nitrogen of His 79β. Additionally, protonated His residues are well accepted at most of the peptide binding core positions which is in a good agreement with the overall negatively charged peptide binding site of most MHC proteins. © 2012 Patronov et al.; licensee BioMed Central Ltd.