311 resultados para OUTLIERS
Resumo:
La protección de las aguas subterráneas es una prioridad de la política medioambiental de la UE. Por ello ha establecido un marco de prevención y control de la contaminación, que incluye provisiones para evaluar el estado químico de las aguas y reducir la presencia de contaminantes en ellas. Las herramientas fundamentales para el desarrollo de dichas políticas son la Directiva Marco del Agua y la Directiva Hija de Aguas Subterráneas. Según ellas, las aguas se consideran en buen estado químico si: • la concentración medida o prevista de nitratos no supera los 50 mg/l y la de ingredientes activos de plaguicidas, de sus metabolitos y de los productos de reacción no supera el 0,1 μg/l (0,5 μg/l para el total de los plaguicidas medidos) • la concentración de determinadas sustancias de riesgo es inferior al valor umbral fijado por los Estados miembros; se trata, como mínimo, del amonio, arsénico, cadmio, cloruro, plomo, mercurio, sulfatos, tricloroetileno y tetracloroetileno • la concentración de cualquier otro contaminante se ajusta a la definición de buen estado químico enunciada en el anexo V de la Directiva marco sobre la política de aguas • en caso de superarse el valor correspondiente a una norma de calidad o a un valor umbral, una investigación confirma, entre otros puntos, la falta de riesgo significativo para el medio ambiente. Analizar el comportamiento estadístico de los datos procedentes de la red de seguimiento y control puede resultar considerablemente complejo, debido al sesgo positivo que suelen presentar dichos datos y a su distribución asimétrica, debido a la existencia de valores anómalos y diferentes tipos de suelos y mezclas de contaminantes. Además, la distribución de determinados componentes en el agua subterránea puede presentar concentraciones por debajo del límite de detección o no ser estacionaria debida a la existencia de tendencias lineales o estacionales. En el primer caso es necesario realizar estimaciones de esos valores desconocidos, mediante procedimientos que varían en función del porcentaje de valores por debajo del límite de detección y el número de límites de detección aplicables. En el segundo caso es necesario eliminar las tendencias de forma previa a la realización de contrastes de hipótesis sobre los residuos. Con esta tesis se ha pretendido establecer las bases estadísticas para el análisis riguroso de los datos de las redes de calidad con objeto de realizar la evaluación del estado químico de las masas de agua subterránea para la determinación de tendencias al aumento en la concentración de contaminantes y para la detección de empeoramientos significativos, tanto en los casos que se ha fijado un estándar de calidad por el organismo medioambiental competente como en aquéllos que no ha sido así. Para diseñar una metodología que permita contemplar la variedad de casos existentes, se han analizado los datos de la Red Oficial de Seguimiento y Control del Estado Químico de las Aguas Subterráneas del Ministerio de Agricultura, Alimentación y Medio Ambiente (Magrama). A continuación, y dado que los Planes Hidrológicos de Cuenca son la herramienta básica de las Directivas, se ha seleccionado la Cuenca del Júcar, dada su designación como cuenca piloto en la estrategia de implementación común (CIS) de la Comisión Europea. El objetivo principal de los grupos de trabajo creados para ello se dirigió a implementar la Directiva Derivada de Agua Subterráneas y los elementos de la Directiva Marco del Agua relacionadas, en especial la toma de datos en los puntos de control y la preparación del primer Plan de Gestión de Cuencas Hidrográficas. Dada la extensión de la zona y con objeto de analizar una masa de agua subterránea (definida como la unidad de gestión en las Directivas), se ha seleccionado una zona piloto (Plana de Vinaroz Peñiscola) en la que se han aplicado los procedimientos desarrollados con objeto de determinar el estado químico de dicha masa. Los datos examinados no contienen en general valores de concentración de contaminantes asociados a fuentes puntuales, por lo que para la realización del estudio se han seleccionado valores de concentración de los datos más comunes, es decir, nitratos y cloruros. La estrategia diseñada combina el análisis de tendencias con la elaboración de intervalos de confianza cuando existe un estándar de calidad e intervalos de predicción cuando no existe o se ha superado dicho estándar. De forma análoga se ha procedido en el caso de los valores por debajo del límite de detección, tomando los valores disponibles en la zona piloto de la Plana de Sagunto y simulando diferentes grados de censura con objeto de comparar los resultados obtenidos con los intervalos producidos de los datos reales y verificar de esta forma la eficacia del método. El resultado final es una metodología general que integra los casos existentes y permite definir el estado químico de una masa de agua subterránea, verificar la existencia de impactos significativos en la calidad del agua subterránea y evaluar la efectividad de los planes de medidas adoptados en el marco del Plan Hidrológico de Cuenca. ABSTRACT Groundwater protection is a priority of the EU environmental policy. As a result, it has established a framework for prevention and control of pollution, which includes provisions for assessing the chemical status of waters and reducing the presence of contaminants in it. The measures include: • criteria for assessing the chemical status of groundwater bodies • criteria for identifying significant upward trends and sustained concentrations of contaminants and define starting points for reversal of such trends • preventing and limiting indirect discharges of pollutants as a result of percolation through soil or subsoil. The basic tools for the development of such policies are the Water Framework Directive and Groundwater Daughter Directive. According to them, the groundwater bodies are considered in good status if: • measured or predicted concentration of nitrate does not exceed 50 mg / l and the active ingredients of pesticides, their metabolites and reaction products do not exceed 0.1 mg / l (0.5 mg / l for total of pesticides measured) • the concentration of certain hazardous substances is below the threshold set by the Member States concerned, at least, of ammonium, arsenic, cadmium, chloride, lead, mercury, sulphates, trichloroethylene and tetrachlorethylene • the concentration of other contaminants fits the definition of good chemical status set out in Annex V of the Framework Directive on water policy • If the value corresponding to a quality standard or a threshold value is exceeded, an investigation confirms, among other things, the lack of significant risk to the environment. Analyzing the statistical behaviour of the data from the monitoring networks may be considerably complex due to the positive bias which often presents such information and its asymmetrical distribution, due to the existence of outliers and different soil types and mixtures of pollutants. Furthermore, the distribution of certain components in groundwater may have concentrations below the detection limit or may not be stationary due to the existence of linear or seasonal trends. In the first case it is necessary to estimate these unknown values, through procedures that vary according to the percentage of values below the limit of detection and the number of applicable limits of detection. In the second case removing trends is needed before conducting hypothesis tests on residuals. This PhD thesis has intended to establish the statistical basis for the rigorous analysis of data quality networks in order to conduct the evaluation of the chemical status of groundwater bodies for determining upward and sustained trends in pollutant concentrations and for the detection of significant deterioration in cases in which an environmental standard has been set by the relevant environmental agency and those that have not. Aiming to design a comprehensive methodology to include the whole range of cases, data from the Groundwater Official Monitoring and Control Network of the Ministry of Agriculture, Food and Environment (Magrama) have been analysed. Then, since River Basin Management Plans are the basic tool of the Directives, the Júcar river Basin has been selected. The main reason is its designation as a pilot basin in the common implementation strategy (CIS) of the European Commission. The main objective of the ad hoc working groups is to implement the Daughter Ground Water Directive and elements of the Water Framework Directive related to groundwater, especially the data collection at control stations and the preparation of the first River Basin Management Plan. Given the size of the area and in order to analyze a groundwater body (defined as the management unit in the Directives), Plana de Vinaroz Peñíscola has been selected as pilot area. Procedures developed to determine the chemical status of that body have been then applied. The data examined do not generally contain pollutant concentration values associated with point sources, so for the study concentration values of the most common data, i.e., nitrates and chlorides have been selected. The designed strategy combines trend analysis with the development of confidence intervals when there is a standard of quality and prediction intervals when there is not or the standard has been exceeded. Similarly we have proceeded in the case of values below the detection limit, taking the available values in Plana de Sagunto pilot area and simulating different degrees of censoring in order to compare the results obtained with the intervals achieved from the actual data and verify in this way the effectiveness of the method. The end result is a general methodology that integrates existing cases to define the chemical status of a groundwater body, verify the existence of significant impacts on groundwater quality and evaluate the effectiveness of the action plans adopted in the framework of the River Basin Management Plan.
Resumo:
A recent study of the divergence times of the major groups of organisms as gauged by amino acid sequence comparison has been expanded and the data have been reanalyzed with a distance measure that corrects for both constraints on amino acid interchange and variation in substitution rate at different sites. Beyond that, the availability of complete genome sequences for several eubacteria and an archaebacterium has had a great impact on the interpretation of certain aspects of the data. Thus, the majority of the archaebacterial sequences are not consistent with currently accepted views of the Tree of Life which cluster the archaebacteria with eukaryotes. Instead, they are either outliers or mixed in with eubacterial orthologs. The simplest resolution of the problem is to postulate that many of these sequences were carried into eukaryotes by early eubacterial endosymbionts about 2 billion years ago, only very shortly after or even coincident with the divergence of eukaryotes and archaebacteria. The strong resemblances of these same enzymes among the major eubacterial groups suggest that the cyanobacteria and Gram-positive and Gram-negative eubacteria also diverged at about this same time, whereas the much greater differences between archaebacterial and eubacterial sequences indicate these two groups may have diverged between 3 and 4 billion years ago.
Resumo:
We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.
Resumo:
Diferentes abordagens teóricas têm sido utilizadas em estudos de sistemas biomoleculares com o objetivo de contribuir com o tratamento de diversas doenças. Para a dor neuropática, por exemplo, o estudo de compostos que interagem com o receptor sigma-1 (Sig-1R) pode elucidar os principais fatores associados à atividade biológica dos mesmos. Nesse propósito, estudos de Relações Quantitativas Estrutura-Atividade (QSAR) utilizando os métodos de regressão por Mínimos Quadrados Parciais (PLS) e Rede Neural Artificial (ANN) foram aplicados a 64 antagonistas do Sig-1R pertencentes à classe de 1-arilpirazóis. Modelos PLS e ANN foram utilizados com o objetivo de descrever comportamentos lineares e não lineares, respectivamente, entre um conjunto de descritores e a atividade biológica dos compostos selecionados. O modelo PLS foi obtido com 51 compostos no conjunto treinamento e 13 compostos no conjunto teste (r² = 0,768, q² = 0,684 e r²teste = 0,785). Testes de leave-N-out, randomização da atividade biológica e detecção de outliers confirmaram a robustez e estabilidade dos modelos e mostraram que os mesmos não foram obtidos por correlações ao acaso. Modelos também foram gerados a partir da Rede Neural Artificial Perceptron de Multicamadas (MLP-ANN), sendo que a arquitetura 6-12-1, treinada com as funções de transferência tansig-tansig, apresentou a melhor resposta para a predição da atividade biológica dos compostos (r²treinamento = 0,891, r²validação = 0,852 e r²teste = 0,793). Outra abordagem foi utilizada para simular o ambiente de membranas sinápticas utilizando bicamadas lipídicas compostas por POPC, DOPE, POPS e colesterol. Os estudos de dinâmica molecular desenvolvidos mostraram que altas concentrações de colesterol induzem redução da área por lipídeo e difusão lateral e aumento na espessura da membrana e nos valores de parâmetro de ordem causados pelo ordenamento das cadeias acil dos fosfolipídeos. As bicamadas lipídicas obtidas podem ser usadas para simular interações entre lipídeos e pequenas moléculas ou proteínas contribuindo para as pesquisas associadas a doenças como Alzheimer e Parkinson. As abordagens usadas nessa tese são essenciais para o desenvolvimento de novas pesquisas em Química Medicinal Computacional.
Resumo:
We consider a robust version of the classical Wald test statistics for testing simple and composite null hypotheses for general parametric models. These test statistics are based on the minimum density power divergence estimators instead of the maximum likelihood estimators. An extensive study of their robustness properties is given though the influence functions as well as the chi-square inflation factors. It is theoretically established that the level and power of these robust tests are stable against outliers, whereas the classical Wald test breaks down. Some numerical examples confirm the validity of the theoretical results.
Resumo:
A Internet das Coisas é um novo paradigma de comunicação que estende o mundo virtual (Internet) para o mundo real com a interface e interação entre objetos. Ela possuirá um grande número de dispositivos heteregôneos interconectados, que deverá gerar um grande volume de dados. Um dos importantes desafios para seu desenvolvimento é se guardar e processar esse grande volume de dados em aceitáveis intervalos de tempo. Esta pesquisa endereça esse desafio, com a introdução de serviços de análise e reconhecimento de padrões nas camadas inferiores do modelo de para Internet das Coisas, que procura reduzir o processamento nas camadas superiores. Na pesquisa foram analisados os modelos de referência para Internet das Coisas e plataformas para desenvolvimento de aplicações nesse contexto. A nova arquitetura de implementada estende o LinkSmart Middeware pela introdução de um módulo para reconhecimento de padrões, implementa algoritmos para estimação de valores, detecção de outliers e descoberta de grupos nos dados brutos, oriundos de origens de dados. O novo módulo foi integrado à plataforma para Big Data Hadoop e usa as implementações algorítmicas do framework Mahout. Este trabalho destaca a importância da comunicação cross layer integrada à essa nova arquitetura. Nos experimentos desenvolvidos na pesquisa foram utilizadas bases de dados reais, provenientes do projeto Smart Santander, de modo a validar da nova arquitetura de IoT integrada aos serviços de análise e reconhecimento de padrões e a comunicação cross-layer.
Resumo:
O pressuposto desta pesquisa é de que a divulgação de informações ambientais, no âmbito das provisões e passivos contingentes, reagiu aos avanços na normatização contábil. A normatização contábil genérica sobre evidenciação de obrigações incertas era restrita, em meados de 1976, à Lei no 6.404, e assim permaneceu ao longo de pelo menos uma década e meia, quando começou a ser desenvolvida. Ao longo dos anos foram criados padrões obrigatórios de divulgação, com critérios de julgamento mais detalhados para a classificação da obrigação incerta em provável, possível ou remota. Embora ainda apresente algum grau de subjetividade, o desenvolvimento destes critérios pode ter contribuído para a diminuição da assimetria informacional: a empresa passou a contar com um conjunto de orientações mais claras e, portanto, com melhores condições de averiguar e divulgar suas obrigações incertas. Esse avanço contribuiu para que as obrigações ambientais passassem a ter maior exposição, principalmente no âmbito das empresas potencialmente poluidoras, como as do setor de energia elétrica, que utilizam recursos naturais e modificam o meio ambiente. Neste contexto, o objetivo deste estudo foi analisar as evidências de passivo ambiental divulgadas pelas empresas do setor de energia elétrica, de 1997 a 2014. Para tanto, foi desenvolvido um estudo qualitativo, descritivo e longitudinal, por meio da análise de conteúdo de 941 notas explicativas, de uma população de 64 empresas do setor de energia elétrica, de acordo com listagem na BM&FBovespa, em maio de 2015. A amostra foi constituída de 26 empresas, que divulgaram o total de 468 notas explicativas no site da CVM, de 1997 a 2014. Ao longo destes 18 anos, 14 empresas da amostra (53,85%) evidenciaram passivos ambientais ao menos uma vez e 12 instituições (46,15%) não o fizeram e, do total de 468 notas explicativas, 100 (21,37%) evidenciaram passivo ambiental. O número de evidências de passivos ambientais era pequeno em meados de 1997, mas ascendeu, com um aumento mais consistente a partir de 2006, ano que coincide com a aprovação da Norma e Procedimento de Contabilidade 22 - Provisões, Passivos, Contingências Passivas e Contingências Ativas, emitida pelo IBRACON. Adicionalmente, a materialidade quantitativa estava na média de 0,61% para provisões ambientais e 0,89% para os passivos contingentes ambientais, desconsiderando-se os outliers. A dimensão das notas explicativas, em termos de quantidade de palavras, foi crescente e diversificada. Em conclusão, a evidenciação contábil pode, em adição à evidenciação voluntária, ser um meio plausível para a divulgação de questões ambientais e redução da assimetria informacional, principalmente quando a normatização contábil se faz mais clara e detalhada.
Resumo:
Customizing shoe manufacturing is one of the great challenges in the footwear industry. It is a production model change where design adopts not only the main role, but also the main bottleneck. It is therefore necessary to accelerate this process by improving the accuracy of current methods. Rapid prototyping techniques are based on the reuse of manufactured footwear lasts so that they can be modified with CAD systems leading rapidly to new shoe models. In this work, we present a shoe last fast reconstruction method that fits current design and manufacturing processes. The method is based on the scanning of shoe last obtaining sections and establishing a fixed number of landmarks onto those sections to reconstruct the shoe last 3D surface. Automated landmark extraction is accomplished through the use of the self-organizing network, the growing neural gas (GNG), which is able to topographically map the low dimensionality of the network to the high dimensionality of the contour manifold without requiring a priori knowledge of the input space structure. Moreover, our GNG landmark method is tolerant to noise and eliminates outliers. Our method accelerates up to 12 times the surface reconstruction and filtering processes used by the current shoe last design software. The proposed method offers higher accuracy compared with methods with similar efficiency as voxel grid.
Resumo:
3D sensors provides valuable information for mobile robotic tasks like scene classification or object recognition, but these sensors often produce noisy data that makes impossible applying classical keypoint detection and feature extraction techniques. Therefore, noise removal and downsampling have become essential steps in 3D data processing. In this work, we propose the use of a 3D filtering and down-sampling technique based on a Growing Neural Gas (GNG) network. GNG method is able to deal with outliers presents in the input data. These features allows to represent 3D spaces, obtaining an induced Delaunay Triangulation of the input space. Experiments show how the state-of-the-art keypoint detectors improve their performance using GNG output representation as input data. Descriptors extracted on improved keypoints perform better matching in robotics applications as 3D scene registration.
Resumo:
In this study, we utilise a novel approach to segment out the ventricular system in a series of high resolution T1-weighted MR images. We present a brain ventricles fast reconstruction method. The method is based on the processing of brain sections and establishing a fixed number of landmarks onto those sections to reconstruct the ventricles 3D surface. Automated landmark extraction is accomplished through the use of the self-organising network, the growing neural gas (GNG), which is able to topographically map the low dimensionality of the network to the high dimensionality of the contour manifold without requiring a priori knowledge of the input space structure. Moreover, our GNG landmark method is tolerant to noise and eliminates outliers. Our method accelerates the classical surface reconstruction and filtering processes. The proposed method offers higher accuracy compared to methods with similar efficiency as Voxel Grid.
Resumo:
Tese de mestrado em Matemática Aplicada à Economia e Gestão, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2016
Resumo:
Measuring human capital has been a significant challenge for economists because the main variable of interest is intangible and not directly observable. In the Middle Eastern and Northern African region the task is further complicated by the general scarcity of comparable and reliable data. This study overcomes these challenges by relying on a unique international survey that covers most of the region and by deriving a market-based measure that uses returns to education and various labour market factors as guidance. The results show that private returns to schooling are relatively low in most southern Mediterranean countries (SMC). Israel and Turkey are clear outliers, surpassing even the EU-MED averages. In Algeria and Jordan, the returns are almost flat, implying that earnings do not respond significantly to education levels. Despite high attainment levels, Greece, Spain and Portugal also perform badly; only marginally surpassing some of the bottom-ranked SMC, providing evidence of problems in absorption capacity. The baseline scenarios for 2030 show substantial sensitivity to current estimates on returns to education. In particular, improving attainment levels can produce measurable gains in the future only when the returns to education are already high. Such is the case for Egypt, Morocco and Turkey, which substantially improve their human capital stocks under the baseline scenarios, surpassing several EU-MED countries with little or no room for improvement.
Resumo:
The quality of water level time series data strongly varies with periods of high and low quality sensor data. In this paper we are presenting the processing steps which were used to generate high quality water level data from water pressure measured at the Time Series Station (TSS) Spiekeroog. The TSS is positioned in a tidal inlet between the islands of Spiekeroog and Langeoog in the East Frisian Wadden Sea (southern North Sea). The processing steps will cover sensor drift, outlier identification, interpolation of data gaps and quality control. A central step is the removal of outliers. For this process an absolute threshold of 0.25m/10min was selected which still keeps the water level increase and decrease during extreme events as shown during the quality control process. A second important feature of data processing is the interpolation of gappy data which is accomplished with a high certainty of generating trustworthy data. Applying these methods a 10 years dataset (December 2002-December 2012) of water level information at the TSS was processed resulting in a seven year time series (2005-2011).
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-06
Resumo:
Although many of the molecular interactions in kidney development are now well understood, the molecules involved in the specification of the metanephric mesenchyme from surrounding intermediate mesoderm and, hence, the formation of the renal progenitor population are poorly characterized. In this study, cDNA microarrays were used to identify genes enriched in the murine embryonic day 10.5 (E10.5) uninduced metanephric mesenchyme, the renal progenitor population, in comparison with more rostral derivatives of the intermediate mesoderm. Microarray data were analyzed using R statistical software to determine accurately genes differentially expressed between these populations. Microarray outliers were biologically verified, and the spatial expression pattern of these genes at E10.5 and subsequent stages of early kidney development was determined by RNA in situ hybridization. This approach identified 21 genes preferentially expressed by the E10.5 metanephric mesenchyme, including Ewing sarcoma homolog, 14-3-3 theta, retinoic acid receptor-alpha, stearoyl-CoA desaturase 2, CD24, and cadherin-11, that may be important in formation of renal progenitor cells. Cell surface proteins such as CD24 and cadherin-11 that were strongly and specifically expressed in the uninduced metanephric mesenchyme and mark the renal progenitor population may prove useful in the purification of renal progenitor cells by FACS. These findings may assist in the isolation and characterization of potential renal stem cells for use in cellular therapies for kidney disease.