8 resultados para Histograms
em Instituto Politécnico do Porto, Portugal
Resumo:
This paper analyzes DNA information using entropy and phase plane concepts. First, the DNA code is converted into a numerical format by means of histograms that capture DNA sequence length ranging from one up to ten bases. This strategy measures dynamical evolutions from 4 up to 410 signal states. The resulting histograms are analyzed using three distinct entropy formulations namely the Shannon, Rényie and Tsallis definitions. Charts of entropy versus sequence length are applied to a set of twenty four species, characterizing 486 chromosomes. The information is synthesized and visualized by adapting phase plane concepts leading to a categorical representation of chromosomes and species.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
This paper aims to study the relationships between chromosomal DNA sequences of twenty species. We propose a methodology combining DNA-based word frequency histograms, correlation methods, and an MDS technique to visualize structural information underlying chromosomes (CRs) and species. Four statistical measures are tested (Minkowski, Cosine, Pearson product-moment, and Kendall τ rank correlations) to analyze the information content of 421 nuclear CRs from twenty species. The proposed methodology is built on mathematical tools and allows the analysis and visualization of very large amounts of stream data, like DNA sequences, with almost no assumptions other than the predefined DNA “word length.” This methodology is able to produce comprehensible three-dimensional visualizations of CR clustering and related spatial and structural patterns. The results of the four test correlation scenarios show that the high-level information clusterings produced by the MDS tool are qualitatively similar, with small variations due to each correlation method characteristics, and that the clusterings are a consequence of the input data and not method’s artifacts.
Resumo:
Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.
Resumo:
A instalação de sistemas de videovigilância, no interior ou exterior, em locais como aeroportos, centros comerciais, escritórios, edifícios estatais, bases militares ou casas privadas tem o intuito de auxiliar na tarefa de monitorização do local contra eventuais intrusos. Com estes sistemas é possível realizar a detecção e o seguimento das pessoas que se encontram no ambiente local, tornando a monitorização mais eficiente. Neste contexto, as imagens típicas (imagem natural e imagem infravermelha) são utilizadas para extrair informação dos objectos detectados e que irão ser seguidos. Contudo, as imagens convencionais são afectadas por condições ambientais adversas como o nível de luminosidade existente no local (luzes muito fortes ou escuridão total), a presença de chuva, de nevoeiro ou de fumo que dificultam a tarefa de monitorização das pessoas. Deste modo, tornou‐se necessário realizar estudos e apresentar soluções que aumentem a eficácia dos sistemas de videovigilância quando sujeitos a condições ambientais adversas, ou seja, em ambientes não controlados, sendo uma das soluções a utilização de imagens termográficas nos sistemas de videovigilância. Neste documento são apresentadas algumas das características das câmaras e imagens termográficas, assim como uma caracterização de cenários de vigilância. Em seguida, são apresentados resultados provenientes de um algoritmo que permite realizar a segmentação de pessoas utilizando imagens termográficas. O maior foco desta dissertação foi na análise dos modelos de descrição (Histograma de Cor, HOG, SIFT, SURF) para determinar o desempenho dos modelos em três casos: distinguir entre uma pessoa e um carro; distinguir entre duas pessoas distintas e determinar que é a mesma pessoa ao longo de uma sequência. De uma forma sucinta pretendeu‐se, com este estudo, contribuir para uma melhoria dos algoritmos de detecção e seguimento de objectos em sequências de vídeo de imagens termográficas. No final, através de uma análise dos resultados provenientes dos modelos de descrição, serão retiradas conclusões que servirão de indicação sobre qual o modelo que melhor permite discriminar entre objectos nas imagens termográficas.
Resumo:
This paper studies the information content of the chromosomes of twenty-three species. Several statistics considering different number of bases for alphabet character encoding are derived. Based on the resulting histograms, word delimiters and character relative frequencies are identified. The knowledge of this data allows moving along each chromosome while evaluating the flow of characters and words. The resulting flux of information is captured by means of Shannon entropy. The results are explored in the perspective of power law relationships allowing a quantitative evaluation of the DNA of the species.
Resumo:
The performance of the Weather Research and Forecast (WRF) model in wind simulation was evaluated under different numerical and physical options for an area of Portugal, located in complex terrain and characterized by its significant wind energy resource. The grid nudging and integration time of the simulations were the tested numerical options. Since the goal is to simulate the near-surface wind, the physical parameterization schemes regarding the boundary layer were the ones under evaluation. Also, the influences of the local terrain complexity and simulation domain resolution on the model results were also studied. Data from three wind measuring stations located within the chosen area were compared with the model results, in terms of Root Mean Square Error, Standard Deviation Error and Bias. Wind speed histograms, occurrences and energy wind roses were also used for model evaluation. Globally, the model accurately reproduced the local wind regime, despite a significant underestimation of the wind speed. The wind direction is reasonably simulated by the model especially in wind regimes where there is a clear dominant sector, but in the presence of low wind speeds the characterization of the wind direction (observed and simulated) is very subjective and led to higher deviations between simulations and observations. Within the tested options, results show that the use of grid nudging in simulations that should not exceed an integration time of 2 days is the best numerical configuration, and the parameterization set composed by the physical schemes MM5–Yonsei University–Noah are the most suitable for this site. Results were poorer in sites with higher terrain complexity, mainly due to limitations of the terrain data supplied to the model. The increase of the simulation domain resolution alone is not enough to significantly improve the model performance. Results suggest that error minimization in the wind simulation can be achieved by testing and choosing a suitable numerical and physical configuration for the region of interest together with the use of high resolution terrain data, if available.
Resumo:
This research focuses on the influence of company sector and size on the level of utilization of Basic and Advanced Quality Tools. The paper starts with a literature review and then presents the methodology used for the survey. Based on the responses from 202 managers of Portuguese ISO 9001:2008 Quality Management System certified organizations, statistical tests were performed. Results show, with 95% confidence level, that industry and services have a similar proportion of use of Basic and Advanced Quality Tools. Concerning size, bigger companies show a higher trend to use Advanced Quality Tools than smaller ones. For Basic Quality Tools, there was no statistical significant difference at a 95% confidence level for different company sizes. The three basic Quality tools with higher utilization were Check sheets, Flow charts and Histograms (for Services) or Control Charts/ (for Industry), however 22% of the surveyed organizations reported not using Basic Quality Tools, which highlights a major improvement opportunity for these companies. Additional studies addressing motivations, benefits and barriers for Quality Tools application should be undertaken for further validation and understanding of these results.