790 resultados para Datasets


Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper proposes a template for modelling complex datasets that integrates traditional statistical modelling approaches with more recent advances in statistics and modelling through an exploratory framework. Our approach builds on the well-known and long standing traditional idea of 'good practice in statistics' by establishing a comprehensive framework for modelling that focuses on exploration, prediction, interpretation and reliability assessment, a relatively new idea that allows individual assessment of predictions. The integrated framework we present comprises two stages. The first involves the use of exploratory methods to help visually understand the data and identify a parsimonious set of explanatory variables. The second encompasses a two step modelling process, where the use of non-parametric methods such as decision trees and generalized additive models are promoted to identify important variables and their modelling relationship with the response before a final predictive model is considered. We focus on fitting the predictive model using parametric, non-parametric and Bayesian approaches. This paper is motivated by a medical problem where interest focuses on developing a risk stratification system for morbidity of 1,710 cardiac patients given a suite of demographic, clinical and preoperative variables. Although the methods we use are applied specifically to this case study, these methods can be applied across any field, irrespective of the type of response.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Signal peptides and transmembrane helices both contain a stretch of hydrophobic amino acids. This common feature makes it difficult for signal peptide and transmembrane helix predictors to correctly assign identity to stretches of hydrophobic residues near the N-terminal methionine of a protein sequence. The inability to reliably distinguish between N-terminal transmembrane helix and signal peptide is an error with serious consequences for the prediction of protein secretory status or transmembrane topology. In this study, we report a new method for differentiating protein N-terminal signal peptides and transmembrane helices. Based on the sequence features extracted from hydrophobic regions (amino acid frequency, hydrophobicity, and the start position), we set up discriminant functions and examined them on non-redundant datasets with jackknife tests. This method can incorporate other signal peptide prediction methods and achieve higher prediction accuracy. For Gram-negative bacterial proteins, 95.7% of N-terminal signal peptides and transmembrane helices can be correctly predicted (coefficient 0.90). Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 99% (coefficient 0.92). For eukaryotic proteins, 94.2% of N-terminal signal peptides and transmembrane helices can be correctly predicted with coefficient 0.83. Given a sensitivity of 90%, transmembrane helices can be identified from signal peptides with a precision of 87% (coefficient 0.85). The method can be used to complement current transmembrane protein prediction and signal peptide prediction methods to improve their prediction accuracies. (C) 2003 Elsevier Inc. All rights reserved.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

This paper describes a process-based metapopulation dynamics and phenology model of prickly acacia, Acacia nilotica, an invasive alien species in Australia. The model, SPAnDX, describes the interactions between riparian and upland sub-populations of A. nilotica within livestock paddocks, including the effects of extrinsic factors such as temperature, soil moisture availability and atmospheric concentrations of carbon dioxide. The model includes the effects of management events such as changing the livestock species or stocking rate, applying fire, and herbicide application. The predicted population behaviour of A. nilotica was sensitive to climate. Using 35 years daily weather datasets for five representative sites spanning the range of conditions that A. nilotica is found in Australia, the model predicted biomass levels that closely accord with expected values at each site. SPAnDX can be used as a decision-support tool in integrated weed management, and to explore the sensitivity of cultural management practices to climate change throughout the range of A. nilotica. The cohort-based DYMEX modelling package used to build and run SPAnDX provided several advantages over more traditional population modelling approaches (e.g. an appropriate specific formalism (discrete time, cohort-based, process-oriented), user-friendly graphical environment, extensible library of reusable components, and useful and flexible input/output support framework). (C) 2003 Published by Elsevier Science B.V.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Translabial ultrasound is increasingly being used for the assessment of women presenting with pelvic floor dysfunction and incontinence (1,2). However, there is little information on normal values for bladder neck descent, with the two available studies disagreeing widely (3,4). No data has so far been published on mobility of the central and posterior compartment which can now also be assessed by ultrasound (5). This study presents normal values for urethral, bladder, cervical and rectal mobility in a cohort of young, stress continent, nulliparous nonpregnant women. Methods 118 nonpregnant nulliparous Caucasian women between 18 and 23 years of age were recruited for an ongoing twin study of pelvic floor function. Translabial ultrasound assessment of pelvic organ mobility was undertaken supine and after bladder emptying (6,7). The best of at least three effective Valsalva manoeuvres was used for evaluation, with no attempts at standardization of Valsalva pressure. Parameters of anterior compartment mobility were obtained by the use of on-screen calipers; cervical and rectal descent were evaluated on printouts. All examinations were carried out under direct supervision of the first author or by personnel trained by him for at least 100 consecutive assessments. Results The median age of participants in this study was 20 (range 18- 23). Mean body mass index was 23 (range 16.9- 36.7). Of 118 women, 2 were completely unable to perform a Valsalva manoeuvre despite repeated efforts at teaching and were excluded from analysis, as were ten women who complained of urinary stress incontinence, leaving 106 datasets. Average measurements for the parameters ‘retrovesical angle at rest’ (RVA-R) and on Valsalva (RVA-S), urethral rotation, bladder neck mobility, cysto-cele descent, cervical descent and descent of the rectal ampulla are given in Table 1.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

For modern consumer cameras often approximate calibration data is available, making applications such as 3D reconstruction or photo registration easier as compared to the pure uncalibrated setting. In this paper we address the setting with calibrateduncalibrated image pairs: for one image intrinsic parameters are assumed to be known, whereas the second view has unknown distortion and calibration parameters. This situation arises e.g. when one would like to register archive imagery to recently taken photos. A commonly adopted strategy for determining epipolar geometry is based on feature matching and minimal solvers inside a RANSAC framework. However, only very few existing solutions apply to the calibrated-uncalibrated setting. We propose a simple and numerically stable two-step scheme to first estimate radial distortion parameters and subsequently the focal length using novel solvers. We demonstrate the performance on synthetic and real datasets.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

One of the current frontiers in the clinical management of Pectus Excavatum (PE) patients is the prediction of the surgical outcome prior to the intervention. This can be done through computerized simulation of the Nuss procedure, which requires an anatomically correct representation of the costal cartilage. To this end, we take advantage of the costal cartilage tubular structure to detect it through multi-scale vesselness filtering. This information is then used in an interactive 2D initialization procedure which uses anatomical maximum intensity projections of 3D vesselness feature images to efficiently initialize the 3D segmentation process. We identify the cartilage tissue centerlines in these projected 2D images using a livewire approach. We finally refine the 3D cartilage surface through region-based sparse field level-sets. We have tested the proposed algorithm in 6 noncontrast CT datasets from PE patients. A good segmentation performance was found against reference manual contouring, with an average Dice coefficient of 0.75±0.04 and an average mean surface distance of 1.69±0.30mm. The proposed method requires roughly 1 minute for the interactive initialization step, which can positively contribute to an extended use of this tool in clinical practice, since current manual delineation of the costal cartilage can take up to an hour.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

In daily cardiology practice, assessment of left ventricular (LV) global function using non-invasive imaging remains central for the diagnosis and follow-up of patients with cardiovascular diseases. Despite the different methodologies currently accessible for LV segmentation in cardiac magnetic resonance (CMR) images, a fast and complete LV delineation is still limitedly available for routine use. In this study, a localized anatomically constrained affine optical flow method is proposed for fast and automatic LV tracking throughout the full cardiac cycle in short-axis CMR images. Starting from an automatically delineated LV in the end-diastolic frame, the endocardial and epicardial boundaries are propagated by estimating the motion between adjacent cardiac phases using optical flow. In order to reduce the computational burden, the motion is only estimated in an anatomical region of interest around the tracked boundaries and subsequently integrated into a local affine motion model. Such localized estimation enables to capture complex motion patterns, while still being spatially consistent. The method was validated on 45 CMR datasets taken from the 2009 MICCAI LV segmentation challenge. The proposed approach proved to be robust and efficient, with an average distance error of 2.1 mm and a correlation with reference ejection fraction of 0.98 (1.9 ± 4.5%). Moreover, it showed to be fast, taking 5 seconds for the tracking of a full 4D dataset (30 ms per image). Overall, a novel fast, robust and accurate LV tracking methodology was proposed, enabling accurate assessment of relevant global function cardiac indices, such as volumes and ejection fraction.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Mestrado em Engenharia Informática

Relevância:

10.00% 10.00%

Publicador:

Resumo:

25th Conference of the European Cetacean Society. Long-terms datasets on marine mammals: learning from the past to manage the future, Cadiz, Spain, 21-23 March 2011.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Jornadas "Ciência nos Açores - que futuro?", Biblioteca Pública e Arquivo Regional de Ponta Delgada, Largo do Colégio, Ponta Delgada, 7-8 de junho.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

The species abundance distribution (SAD) has been a central focus of community ecology for over fifty years, and is currently the subject of widespread renewed interest. The gambin model has recently been proposed as a model that provides a superior fit to commonly preferred SAD models. It has also been argued that the model's single parameter (α) presents a potentially informative ecological diversity metric, because it summarises the shape of the SAD in a single number. Despite this potential, few empirical tests of the model have been undertaken, perhaps because the necessary methods and software for fitting the model have not existed. Here, we derive a maximum likelihood method to fit the model, and use it to undertake a comprehensive comparative analysis of the fit of the gambin model. The functions and computational code to fit the model are incorporated in a newly developed free-to-download R package (gambin). We test the gambin model using a variety of datasets and compare the fit of the gambin model to fits obtained using the Poisson lognormal, logseries and zero-sum multinomial distributions. We found that gambin almost universally provided a better fit to the data and that the fit was consistent for a variety of sample grain sizes. We demonstrate how α can be used to differentiate intelligibly between community structures of Azorean arthropods sampled in different land use types. We conclude that gambin presents a flexible model capable of fitting a wide variety of observed SAD data, while providing a useful index of SAD form in its single fitted parameter. As such, gambin has wide potential applicability in the study of SADs, and ecology more generally.

Relevância:

10.00% 10.00%

Publicador:

Resumo:

Esta dissertação aborda o problema de detecção e desvio de obstáculos "SAA- Sense And Avoid" em movimento para veículos aéreos. Em particular apresenta contribuições tendo em vista a obtenção de soluções para permitir a utilização de aeronaves não tripuladas em espaço aéreo não segregado e para aplicações civis. Estas contribuições caracterizam-se por: uma análise do problema de SAA em \UAV's - Unmmaned Aerial Vehicles\ civis; a definição do conceito e metodologia para o projecto deste tipo de sistemas; uma proposta de \ben- chmarking\ para o sistema SAA caracterizando um conjunto de "datasets\ adequados para a validação de métodos de detecção; respectiva validação experimental do processo e obtenção de "datasets"; a análise do estado da arte para a detecção de \Dim point features\ ; o projecto de uma arquitectura para uma solução de SAA incorporando a integração de compensação de \ego motion" e respectiva validação para um "dataset" recolhido. Tendo em vista a análise comparativa de diferentes métodos bem como a validação de soluções foi proposta a recolha de um conjunto de \datasets" de informação sensorial e de navegação. Para os mesmos foram definidos um conjunto de experiências e cenários experimentais. Foi projectado e implementado um setup experimental para a recolha dos \datasets" e realizadas experiências de recolha recorrendo a aeronaves tripuladas. O setup desenvolvido incorpora um sistema inercial de alta precisão, duas câmaras digitais sincronizadas (possibilitando análise de informa formação stereo) e um receptor GPS. As aeronaves alvo transportam um receptor GPS com logger incorporado permitindo a correlação espacial dos resultados de detecção. Com este sistema foram recolhidos dados referentes a cenários de aproximação com diferentes trajectórias e condições ambientais bem como incorporando movimento do dispositivo detector. O método proposto foi validado para os datasets recolhidos tendo-se verificado, numa análise preliminar, a detecção do obstáculo (avião ultraleve) em todas as frames para uma distância inferior a 3 km com taxas de sucesso na ordem dos 95% para distâncias entre os 3 e os 4 km. Os resultados apresentados permitem validar a arquitectura proposta para a solução do problema de SAA em veículos aéreos autónomos e abrem perspectivas muito promissoras para desenvolvimento futuro com forte impacto técnico-científico bem como sócio-economico. A incorporação de informa formação de \ego motion" permite fornecer um forte incremento em termos de desempenho.