944 resultados para Data processing methods
Resumo:
This paper describes a novel method to enhance current airport surveillance systems used in Advanced Surveillance Monitoring Guidance and Control Systems (A-SMGCS). The proposed method allows for the automatic calibration of measurement models and enhanced detection of nonideal situations, increasing surveillance products integrity. It is based on the definition of a set of observables from the surveillance processing chain and a rule based expert system aimed to change the data processing methods
Resumo:
Fine-fraction (<63 µm) grain-size analyses of 530 samples from Holes 1095A, 1095B, and 1095D allow assessment of the downhole grain-size distribution at Drift 7. A variety of data processing methods, statistical treatment, and display techniques were used to describe this data set. The downhole fine-fraction grain-size distribution documents significant variations in the average grain-size composition and its cyclic pattern, revealed in five prominent intervals: (1) between 0 and 40 meters composite depth (mcd) (0 and 1.3 Ma), (2) between 40 and 80 mcd (1.3 and 2.4 Ma), (3) between 80 and 220 mcd (2.4 and 6 Ma), (4) between 220 and 360 mcd, and (5) below 360 mcd (prior to 8.1 Ma). In an approach designed to characterize depositional processes at Drift 7, we used statistical parameters determined by the method of moments for the sortable silt fraction to distinguish groups in the grainsize data set. We found three distinct grain-size populations and used these for a tentative environmental interpretation. Population 1 is related to a process in which glacially eroded shelf material was redeposited by turbidites with an ice-rafted debris influence. Population 2 is composed of interglacial turbidites. Population 3 is connected to depositional sequence tops linked to bioturbated sections that, in turn, are influenced by contourite currents and pelagic background sedimentation.
Resumo:
O Relatório de estágio tem por desígnio, compreender e analisar, o statu quo das dimensões da satisfação (Contemplando o seu bem-estar pessoal e social) dos utentes das valências da Santa Casa da Misericórdia de Évora, por conseguinte, analisando e compreendendo a qualidade de vida desses mesmos utentes, debruçando-se, pois, sobre as potenciais e/ou evidentes necessidades institucionais por suprimir, e problemas sociais por erradicar. O levantamento da actual informação obteve-se através de dois instrumentos metodológicos de recolha de dados: inquérito por questionário de aplicação directa auto-administrada (utentes), e por intermédio de entrevistas semi-estruturadas (diretores técnicos das valências). Relativamente, aos métodos de tratamento de dados, no primeiro instrumento utilizou-se IBM SPSS 22, já no segundo análise de conteúdo. Após a detecção das necessidades existentes, procedeu-se a elaboração de uma proposta de intervenção sócio-organizacional, por forma a solucionar essas mesmas necessidades existentes. O presente documento estrutura-se em 4 eixos: o Eixo 1 – Componente teórica - ; o Eixo 2 – Componente metodológica - ; Eixo 3 – Componente dos resultados - ; Eixo 4 – Componente de intervenção sócio- organizacional; Abstract: Intervention of the Holy House of Mercy of Evora - The quality of the Social Responses and satisfaction perceived Internal SCME of users -. This work is to design, understand and analyze the status quo of the dimensions of satisfaction (Contemplating their personal and social welfare) of users of the valences of the Holy House of Mercy of Evora, therefore, analyzing and understanding the quality life of these same users, addressing therefore, on potential and / or obvious institutional needs to suppress and eradicate social problems. The survey of current information was obtained through two methodological tools for data collection: survey by self-administered direct application questionnaire (users), and through semi-structured interviews (technical directors of valences). With regard to the data processing methods, the first instrument was used SPSS 22, in the second content analysis. After detection of the needs, it proceeded to draw up a proposal for social and organizational intervention in order to solve these same existing needs. This document is divided into four axes: Axis 1 - Theoretical component -; Axis 2 - methodological component -; Axis 3 - Component of results -; Axis 4 - socio-organizational intervention component.
Resumo:
The cost of spatial join processing can be very high because of the large sizes of spatial objects and the computation-intensive spatial operations. While parallel processing seems a natural solution to this problem, it is not clear how spatial data can be partitioned for this purpose. Various spatial data partitioning methods are examined in this paper. A framework combining the data-partitioning techniques used by most parallel join algorithms in relational databases and the filter-and-refine strategy for spatial operation processing is proposed for parallel spatial join processing. Object duplication caused by multi-assignment in spatial data partitioning can result in extra CPU cost as well as extra communication cost. We find that the key to overcome this problem is to preserve spatial locality in task decomposition. We show in this paper that a near-optimal speedup can be achieved for parallel spatial join processing using our new algorithms.
Resumo:
In this paper, processing methods of Fourier optics implemented in a digital holographic microscopy system are presented. The proposed methodology is based on the possibility of the digital holography in carrying out the whole reconstruction of the recorded wave front and consequently, the determination of the phase and intensity distribution in any arbitrary plane located between the object and the recording plane. In this way, in digital holographic microscopy the field produced by the objective lens can be reconstructed along its propagation, allowing the reconstruction of the back focal plane of the lens, so that the complex amplitudes of the Fraunhofer diffraction, or equivalently the Fourier transform, of the light distribution across the object can be known. The manipulation of Fourier transform plane makes possible the design of digital methods of optical processing and image analysis. The proposed method has a great practical utility and represents a powerful tool in image analysis and data processing. The theoretical aspects of the method are presented, and its validity has been demonstrated using computer generated holograms and images simulations of microscopic objects. (c) 2007 Elsevier B.V. All rights reserved.
Resumo:
Urbanization and the ability to manage for a sustainable future present numerous challenges for geographers and planners in metropolitan regions. Remotely sensed data are inherently suited to provide information on urban land cover characteristics, and their change over time, at various spatial and temporal scales. Data models for establishing the range of urban land cover types and their biophysical composition (vegetation, soil, and impervious surfaces) are integrated to provide a hierarchical approach to classifying land cover within urban environments. These data also provide an essential component for current simulation models of urban growth patterns, as both calibration and validation data. The first stages of the approach have been applied to examine urban growth between 1988 and 1995 for a rapidly developing area in southeast Queensland, Australia. Landsat Thematic Mapper image data provided accurate (83% adjusted overall accuracy) classification of broad land cover types and their change over time. The combination of commonly available remotely sensed data, image processing methods, and emerging urban growth models highlights an important application for current and next generation moderate spatial resolution image data in studies of urban environments.
Resumo:
Microarray allow to monitoring simultaneously thousands of genes, where the abundance of the transcripts under a same experimental condition at the same time can be quantified. Among various available array technologies, double channel cDNA microarray experiments have arisen in numerous technical protocols associated to genomic studies, which is the focus of this work. Microarray experiments involve many steps and each one can affect the quality of raw data. Background correction and normalization are preprocessing techniques to clean and correct the raw data when undesirable fluctuations arise from technical factors. Several recent studies showed that there is no preprocessing strategy that outperforms others in all circumstances and thus it seems difficult to provide general recommendations. In this work, it is proposed to use exploratory techniques to visualize the effects of preprocessing methods on statistical analysis of cancer two-channel microarray data sets, where the cancer types (classes) are known. For selecting differential expressed genes the arrow plot was used and the graph of profiles resultant from the correspondence analysis for visualizing the results. It was used 6 background methods and 6 normalization methods, performing 36 pre-processing methods and it was analyzed in a published cDNA microarray database (Liver) available at http://genome-www5.stanford.edu/ which microarrays were already classified by cancer type. All statistical analyses were performed using the R statistical software.
Resumo:
Coronary artery disease (CAD) is currently one of the most prevalent diseases in the world population and calcium deposits in coronary arteries are one direct risk factor. These can be assessed by the calcium score (CS) application, available via a computed tomography (CT) scan, which gives an accurate indication of the development of the disease. However, the ionising radiation applied to patients is high. This study aimed to optimise the protocol acquisition in order to reduce the radiation dose and explain the flow of procedures to quantify CAD. The main differences in the clinical results, when automated or semiautomated post-processing is used, will be shown, and the epidemiology, imaging, risk factors and prognosis of the disease described. The software steps and the values that allow the risk of developingCADto be predicted will be presented. A64-row multidetector CT scan with dual source and two phantoms (pig hearts) were used to demonstrate the advantages and disadvantages of the Agatston method. The tube energy was balanced. Two measurements were obtained in each of the three experimental protocols (64, 128, 256 mAs). Considerable changes appeared between the values of CS relating to the protocol variation. The predefined standard protocol provided the lowest dose of radiation (0.43 mGy). This study found that the variation in the radiation dose between protocols, taking into consideration the dose control systems attached to the CT equipment and image quality, was not sufficient to justify changing the default protocol provided by the manufacturer.
Resumo:
Recently, there has been a growing interest in the field of metabolomics, materialized by a remarkable growth in experimental techniques, available data and related biological applications. Indeed, techniques as Nuclear Magnetic Resonance, Gas or Liquid Chromatography, Mass Spectrometry, Infrared and UV-visible spectroscopies have provided extensive datasets that can help in tasks as biological and biomedical discovery, biotechnology and drug development. However, as it happens with other omics data, the analysis of metabolomics datasets provides multiple challenges, both in terms of methodologies and in the development of appropriate computational tools. Indeed, from the available software tools, none addresses the multiplicity of existing techniques and data analysis tasks. In this work, we make available a novel R package, named specmine, which provides a set of methods for metabolomics data analysis, including data loading in different formats, pre-processing, metabolite identification, univariate and multivariate data analysis, machine learning, and feature selection. Importantly, the implemented methods provide adequate support for the analysis of data from diverse experimental techniques, integrating a large set of functions from several R packages in a powerful, yet simple to use environment. The package, already available in CRAN, is accompanied by a web site where users can deposit datasets, scripts and analysis reports to be shared with the community, promoting the efficient sharing of metabolomics data analysis pipelines.
Resumo:
Researchers working in the field of global connectivity analysis using diffusion magnetic resonance imaging (MRI) can count on a wide selection of software packages for processing their data, with methods ranging from the reconstruction of the local intra-voxel axonal structure to the estimation of the trajectories of the underlying fibre tracts. However, each package is generally task-specific and uses its own conventions and file formats. In this article we present the Connectome Mapper, a software pipeline aimed at helping researchers through the tedious process of organising, processing and analysing diffusion MRI data to perform global brain connectivity analyses. Our pipeline is written in Python and is freely available as open-source at www.cmtk.org.
Resumo:
BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology. RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads. CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
Resumo:
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data. RESULTS: We conducted an extensive comparison of eleven methods for differential expression analysis of RNA-seq data. All methods are freely available within the R framework and take as input a matrix of counts, i.e. the number of reads mapping to each genomic feature of interest in each of a number of samples. We evaluate the methods based on both simulated data and real RNA-seq data. CONCLUSIONS: Very small sample sizes, which are still common in RNA-seq experiments, impose problems for all evaluated methods and any results obtained under such conditions should be interpreted with caution. For larger sample sizes, the methods combining a variance-stabilizing transformation with the 'limma' method for differential expression analysis perform well under many different conditions, as does the nonparametric SAMseq method.
Resumo:
Automatic environmental monitoring networks enforced by wireless communication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influenced by the relief. The article shows how these relations, possibly regionalized and non-linear, can be modelled from data using the information from digital elevation models. The particular illustration of the developed methodology concerns the mapping of temperatures (including the situations of Föhn and temperature inversion) given the measurements taken from the Swiss meteorological monitoring network. The range of the methods used in the study includes data-driven feature selection, support vector algorithms and artificial neural networks.
Resumo:
Background Nowadays, combining the different sources of information to improve the biological knowledge available is a challenge in bioinformatics. One of the most powerful methods for integrating heterogeneous data types are kernel-based methods. Kernel-based data integration approaches consist of two basic steps: firstly the right kernel is chosen for each data set; secondly the kernels from the different data sources are combined to give a complete representation of the available data for a given statistical task. Results We analyze the integration of data from several sources of information using kernel PCA, from the point of view of reducing dimensionality. Moreover, we improve the interpretability of kernel PCA by adding to the plot the representation of the input variables that belong to any dataset. In particular, for each input variable or linear combination of input variables, we can represent the direction of maximum growth locally, which allows us to identify those samples with higher/lower values of the variables analyzed. Conclusions The integration of different datasets and the simultaneous representation of samples and variables together give us a better understanding of biological knowledge.
Resumo:
Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset.