337 resultados para Outliers


Relevância:

10.00% 10.00%

Publicador:

Resumo:

Genome-scale metabolic models promise important insights into cell function. However, the definition of pathways and functional network modules within these models, and in the biochemical literature in general, is often based on intuitive reasoning. Although mathematical methods have been proposed to identify modules, which are defined as groups of reactions with correlated fluxes, there is a need for experimental verification. We show here that multivariate statistical analysis of the NMR-derived intra- and extracellular metabolite profiles of single-gene deletion mutants in specific metabolic pathways in the yeast Saccharomyces cerevisiae identified outliers whose profiles were markedly different from those of the other mutants in their respective pathways. Application of flux coupling analysis to a metabolic model of this yeast showed that the deleted gene in an outlying mutant encoded an enzyme that was not part of the same functional network module as the other enzymes in the pathway. We suggest that metabolomic methods such as this, which do not require any knowledge of how a gene deletion might perturb the metabolic network, provide an empirical method for validating and ultimately refining the predicted network structure.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Age–depth models form the backbone of most palaeoenvironmental studies. However, procedures for constructing chronologies vary between studies, they are usually not explained sufficiently, and some are inadequate for handling calibrated radiocarbon dates. An alternative method based on importance sampling through calibrated dates is proposed. Dedicated R code is presented which works with calibrated radiocarbon as well as other dates, and provides a simple, systematic, transparent, documented and customizable alternative. The code automatically produces age–depth models, enabling exploration of the impacts of different assumptions (e.g., model type, hiatuses, age offsets, outliers, and extrapolation).

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper presents Yagada, an algorithm to search labelled graphs for anomalies using both structural data and numeric attributes. Yagada is explained using several security-related examples and validated with experiments on a physical Access Control database. Quantitative analysis shows that in the upper range of anomaly thresholds, Yagada detects twice as many anomalies as the best-performing numeric discretization algorithm. Qualitative evaluation shows that the detected anomalies are meaningful, representing a com- bination of structural irregularities and numerical outliers.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Radiocarbon dating is routinely used in paleoecology to build chronolo- gies of lake and peat sediments, aiming at inferring a model that would relate the sediment depth with its age. We present a new approach for chronology building (called “Bacon”) that has received enthusiastic attention by paleoecologists. Our methodology is based on controlling core accumulation rates using a gamma autoregressive semiparametric model with an arbitrary number of subdivisions along the sediment. Using prior knowledge about accumulation rates is crucial and informative priors are routinely used. Since many sediment cores are currently analyzed, using different data sets and prior distributions, a robust (adaptive) MCMC is very useful. We use the t-walk (Christen and Fox, 2010), a self adjusting, robust MCMC sampling algorithm, that works acceptably well in many situations. Outliers are also addressed using a recent approach that considers a Student-t model for radiocarbon data. Two examples are presented here, that of a peat core and a core from a lake, and our results are compared with other approaches.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The environmental quality of land can be assessed by calculating relevant threshold values, which differentiate between concentrations of elements resulting from geogenic and diffuse anthropogenic sources and concentrations generated by point sources of elements. A simple process allowing the calculation of these typical threshold values (TTVs) was applied across a region of highly complex geology (Northern Ireland) to six elements of interest; arsenic, chromium, copper, lead, nickel and vanadium. Three methods for identifying domains (areas where a readily identifiable factor can be shown to control the concentration of an element) were used: k-means cluster analysis, boxplots and empirical cumulative distribution functions (ECDF). The ECDF method was most efficient at determining areas of both elevated and reduced concentrations and was used to identify domains in this investigation. Two statistical methods for calculating normal background concentrations (NBCs) and upper limits of geochemical baseline variation (ULBLs), currently used in conjunction with legislative regimes in the UK and Finland respectively, were applied within each domain. The NBC methodology was constructed to run within a specific legislative framework, and its use on this soil geochemical data set was influenced by the presence of skewed distributions and outliers. In contrast, the ULBL methodology was found to calculate more appropriate TTVs that were generally more conservative than the NBCs. TTVs indicate what a "typical" concentration of an element would be within a defined geographical area and should be considered alongside the risk that each of the elements pose in these areas to determine potential risk to receptors.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Several theories of legislative organisation have been proposed to explain committee selection in American legislatures, but do these theories travel outside the United States? This paper tests whether these theories apply to data from the Canadian House of Commons. It was found that the distributive and partisan models of legislative organisation explain committee composition in Canada. In many cases, committees in the House of Commons are made up of preference outliers. As predicted by partisan models, it was also found that the governing party stacks committees with its members, but this is conditional upon the strength of the governing party.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Raptors that consume game species may ingest lead fragments or shot embedded in their prey's flesh. Threatened Spanish imperial eagles Aquila adalberti feed on greylag geese in southern Spain in winter, and often ingest lead shot. We analysed bone and feather samples from 65 Spanish imperial eagle museum specimens collected between 1980 and 1999, to investigate the prevalence of elevated lead concentrations. Four of 34 birds (12%) had very elevated bone lead concentrations. All four birds were young and the concentrations were outliers to the distribution, suggesting probable exposure to lead gunshot. Excluding these elevated lead outliers, bone lead concentrations were correlated with the bird's age at death. Three of 41 feathers (7%) had elevated lead concentrations, indicative of high exposure during feather formation. When these outliers were omitted, feather lead concentration was correlated with the age of museum specimens, suggesting that a high proportion of feather lead was exogenous, deposited after specimen collection. Therefore, careful interpretation of feather lead concentrations is required to separate endogenous and exogenous lead. We discuss the potential significance of lead poisoning in Spanish imperial eagles and other raptors, and recommend measures for its reduction. © 2004 Elsevier Ltd. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

BACKGROUND: Currently, two main technologies are used for screening of DNA copy number; the BAC (Bacterial Artificial Chromosome) and the recently developed oligonucleotide-based CGH (Chromosomal Comparative Genomic Hybridization) arrays which are capable of detecting small genomic regions with amplification or deletion. The correlation as well as the discriminative power of these platforms has never been compared statistically on a significant set of human patient samples.

RESULTS: In this paper, we present an exhaustive comparison between the two CGH platforms, undertaken at two independent sites using the same batch of DNA from 19 advanced prostate cancers. The comparison was performed directly on the raw data and a significant correlation was found between the two platforms. The correlation was greatly improved when the data were averaged over large chromosomic regions using a segmentation algorithm. In addition, this analysis has enabled the development of a statistical model to discriminate BAC outliers that might indicate microevents. These microevents were validated by the oligo platform results.

CONCLUSION: This article presents a genome-wide statistical validation of the oligo array platform on a large set of patient samples and demonstrates statistically its superiority over the BAC platform for the Identification of chromosomic events. Taking advantage of a large set of human samples treated by the two technologies, a statistical model has been developed to show that the BAC platform could also detect microevents.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

An outlier removal based data cleaning technique is proposed to
clean manually pre-segmented human skin data in colour images.
The 3-dimensional colour data is projected onto three 2-dimensional
planes, from which outliers are removed. The cleaned 2 dimensional
data projections are merged to yield a 3D clean RGB data. This data
is finally used to build a look up table and a single Gaussian classifier
for the purpose of human skin detection in colour images.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

High gene flow is considered the norm for most marine organisms and is expected to limit their ability to adapt to local environments. Few studies have directly compared the patterns of differentiation at neutral and selected gene loci in marine organisms. We analysed a transcriptome-derived panel of 281 SNPs in Atlantic herring (Clupea harengus), a highly migratory small pelagic fish, for elucidating neutral and selected genetic variation among populations and to identify candidate genes for environmental adaptation. We analysed 607 individuals from 18 spawning locations in the northeast Atlantic, including two temperature clines (5-12 °C) and two salinity clines (5-35‰). By combining genome scan and landscape genetic analyses, four genetically distinct groups of herring were identified: Baltic Sea, Baltic-North Sea transition area, North Sea/British Isles and North Atlantic; notably, samples exhibited divergent clustering patterns for neutral and selected loci. We found statistically strong evidence for divergent selection at 16 outlier loci on a global scale, and significant correlations with temperature and salinity at nine loci. On regional scales, we identified two outlier loci with parallel patterns across temperature clines and five loci associated with temperature in the North Sea/North Atlantic. Likewise, we found seven replicated outliers, of which five were significantly associated with low salinity across both salinity clines. Our results reveal a complex pattern of varying spatial genetic variation among outlier loci, likely reflecting adaptations to local environments. In addition to disclosing the fine scale of local adaptation in a highly vagile species, our data emphasize the need to preserve functionally important biodiversity.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The problem of detecting spatially-coherent groups of data that exhibit anomalous behavior has started to attract attention due to applications across areas such as epidemic analysis and weather forecasting. Earlier efforts from the data mining community have largely focused on finding outliers, individual data objects that display deviant behavior. Such point-based methods are not easy to extend to find groups of data that exhibit anomalous behavior. Scan Statistics are methods from the statistics community that have considered the problem of identifying regions where data objects exhibit a behavior that is atypical of the general dataset. The spatial scan statistic and methods that build upon it mostly adopt the framework of defining a character for regions (e.g., circular or elliptical) of objects and repeatedly sampling regions of such character followed by applying a statistical test for anomaly detection. In the past decade, there have been efforts from the statistics community to enhance efficiency of scan statstics as well as to enable discovery of arbitrarily shaped anomalous regions. On the other hand, the data mining community has started to look at determining anomalous regions that have behavior divergent from their neighborhood.In this chapter,we survey the space of techniques for detecting anomalous regions on spatial data from across the data mining and statistics communities while outlining connections to well-studied problems in clustering and image segmentation. We analyze the techniques systematically by categorizing them appropriately to provide a structured birds eye view of the work on anomalous region detection;we hope that this would encourage better cross-pollination of ideas across communities to help advance the frontier in anomaly detection.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

As técnicas estatísticas são fundamentais em ciência e a análise de regressão linear é, quiçá, uma das metodologias mais usadas. É bem conhecido da literatura que, sob determinadas condições, a regressão linear é uma ferramenta estatística poderosíssima. Infelizmente, na prática, algumas dessas condições raramente são satisfeitas e os modelos de regressão tornam-se mal-postos, inviabilizando, assim, a aplicação dos tradicionais métodos de estimação. Este trabalho apresenta algumas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, em particular na estimação de modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. A investigação é desenvolvida em três vertentes, nomeadamente na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, na estimação do parâmetro ridge em regressão ridge e, por último, em novos desenvolvimentos na estimação com máxima entropia. Na estimação de eficiência técnica com fronteiras de produção condicionadas a estados contingentes, o trabalho desenvolvido evidencia um melhor desempenho dos estimadores de máxima entropia em relação ao estimador de máxima verosimilhança. Este bom desempenho é notório em modelos com poucas observações por estado e em modelos com um grande número de estados, os quais são comummente afetados por colinearidade. Espera-se que a utilização de estimadores de máxima entropia contribua para o tão desejado aumento de trabalho empírico com estas fronteiras de produção. Em regressão ridge o maior desafio é a estimação do parâmetro ridge. Embora existam inúmeros procedimentos disponíveis na literatura, a verdade é que não existe nenhum que supere todos os outros. Neste trabalho é proposto um novo estimador do parâmetro ridge, que combina a análise do traço ridge e a estimação com máxima entropia. Os resultados obtidos nos estudos de simulação sugerem que este novo estimador é um dos melhores procedimentos existentes na literatura para a estimação do parâmetro ridge. O estimador de máxima entropia de Leuven é baseado no método dos mínimos quadrados, na entropia de Shannon e em conceitos da eletrodinâmica quântica. Este estimador suplanta a principal crítica apontada ao estimador de máxima entropia generalizada, uma vez que prescinde dos suportes para os parâmetros e erros do modelo de regressão. Neste trabalho são apresentadas novas contribuições para a teoria de máxima entropia na estimação de modelos mal-postos, tendo por base o estimador de máxima entropia de Leuven, a teoria da informação e a regressão robusta. Os estimadores desenvolvidos revelam um bom desempenho em modelos de regressão linear com pequenas amostras, afetados por colinearidade e outliers. Por último, são apresentados alguns códigos computacionais para estimação com máxima entropia, contribuindo, deste modo, para um aumento dos escassos recursos computacionais atualmente disponíveis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In this study, we utilise a novel approach to segment out the ventricular system in a series of high resolution T1-weighted MR images. We present a brain ventricles fast reconstruction method. The method is based on the processing of brain sections and establishing a fixed number of landmarks onto those sections to reconstruct the ventricles 3D surface. Automated landmark extraction is accomplished through the use of the self-organising network, the growing neural gas (GNG), which is able to topographically map the low dimensionality of the network to the high dimensionality of the contour manifold without requiring a priori knowledge of the input space structure. Moreover, our GNG landmark method is tolerant to noise and eliminates outliers. Our method accelerates the classical surface reconstruction and filtering processes. The proposed method offers higher accuracy compared to methods with similar efficiency as Voxel Grid.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

A Work Project, presented as part of the requirements for the Award of a Masters Double Degree in Economics from the Nova School of Business and Economics and University of Maastricht

Relevância:

10.00% 10.00%

Publicador:

Resumo:

RESUMO - Contexto: a avaliação da qualidade como tema potencialmente importante para utentes e prestadores de cuidados de saúde. A taxa de mortalidade como medida de resultados com um adequado ajustamento do risco. A existência de determinadas características estruturais do hospital às quais está associada uma menor mortalidade. Objectivos: identificar diferenças no desempenho e na taxa de mortalidade dos hospitais e investigar que características estruturais justificam essas diferenças. Metodologia: foram seleccionados os episódios de internamento das doenças de maior mortalidade hospitalar. A medida de desempenho considerada foi a comparação entre a mortalidade observada e a mortalidade esperada, calculada a partir da escala preditiva da mortalidade do Disease Staging, recalibrada para Portugal. A medida de desempenho foi analisada por hospital, doença e grupo de doenças. A ordenação dos hospitais pelo desempenho foi comparada com a ordenação dos hospitais pela taxa de mortalidade observada. O desempenho dentro de cada hospital foi analisado para um grupo de doenças seleccionadas. A relação entre o valor da medida de desempenho e as variáveis «número de episódios», «índice tecnológico» e «gravidade dos doentes tratados» foi analisada através da regressão linear para o conjunto dos episódios e para cada doença e grupo de doenças. Resultados: foram incluídos 379 074 episódios, agrupados em 21 doenças e 8 grupos de doenças e tratados em 81 hospitais. A taxa de mortalidade observada foi de 12%. Existiam diferenças no desempenho por hospital, alguns dos quais se destacam pelo seu melhor/pior nível de desempenho. Foram observadas as limitações da taxa de mortalidade bruta como instrumento de análise do desempenho, no contexto de hospitais com diferentes níveis de risco dos doentes tratados. Para além disso, evidenciou-se que a análise do hospital como um todo ou em cada uma das partes tem resultados distintos, dada a existência de diferentes níveis de desempenho dentro do hospital. Finalmente, verificou- se que a relação entre volume e desempenho, quando existe, é, na quase totalidade dos casos, não linear e inversa à referida na literatura.