215 resultados para preprocessing
Resumo:
We propose a method to encode a 3D magnetic resonance image data and a decoder in such way that fast access to any 2D image is possible by decoding only the corresponding information from each subband image and thus provides minimum decoding time. This will be of immense use for medical community, because most of the PET and MRI data are volumetric data. Preprocessing is carried out at every level before wavelet transformation, to enable easier identification of coefficients from each subband image. Inclusion of special characters in the bit stream facilitates access to corresponding information from the encoded data. Results are taken by performing Daub4 along x (row), y (column) direction and Haar along z (slice) direction. Comparable results are achieved with the existing technique. In addition to that decoding time is reduced by 1.98 times. Arithmetic coding is used to encode corresponding information independently
Resumo:
The diffusion equation-based modeling of near infrared light propagation in tissue is achieved by using finite-element mesh for imaging real-tissue types, such as breast and brain. The finite-element mesh size (number of nodes) dictates the parameter space in the optical tomographic imaging. Most commonly used finite-element meshing algorithms do not provide the flexibility of distinct nodal spacing in different regions of imaging domain to take the sensitivity of the problem into consideration. This study aims to present a computationally efficient mesh simplification method that can be used as a preprocessing step to iterative image reconstruction, where the finite-element mesh is simplified by using an edge collapsing algorithm to reduce the parameter space at regions where the sensitivity of the problem is relatively low. It is shown, using simulations and experimental phantom data for simple meshes/domains, that a significant reduction in parameter space could be achieved without compromising on the reconstructed image quality. The maximum errors observed by using the simplified meshes were less than 0.27% in the forward problem and 5% for inverse problem.
Resumo:
In this paper, we develop a game theoretic approach for clustering features in a learning problem. Feature clustering can serve as an important preprocessing step in many problems such as feature selection, dimensionality reduction, etc. In this approach, we view features as rational players of a coalitional game where they form coalitions (or clusters) among themselves in order to maximize their individual payoffs. We show how Nash Stable Partition (NSP), a well known concept in the coalitional game theory, provides a natural way of clustering features. Through this approach, one can obtain some desirable properties of the clusters by choosing appropriate payoff functions. For a small number of features, the NSP based clustering can be found by solving an integer linear program (ILP). However, for large number of features, the ILP based approach does not scale well and hence we propose a hierarchical approach. Interestingly, a key result that we prove on the equivalence between a k-size NSP of a coalitional game and minimum k-cut of an appropriately constructed graph comes in handy for large scale problems. In this paper, we use feature selection problem (in a classification setting) as a running example to illustrate our approach. We conduct experiments to illustrate the efficacy of our approach.
Resumo:
We propose an iterative algorithm to detect transient segments in audio signals. Short time Fourier transform(STFT) is used to detect rapid local changes in the audio signal. The algorithm has two steps that iteratively - (a) calculate a function of the STFT and (b) build a transient signal. A dynamic thresholding scheme is used to locate the potential positions of transients in the signal. The iterative procedure ensures that genuine transients are built up while the localised spectral noise are suppressed by using an energy criterion. The extracted transient signal is later compared to a ground truth dataset. The algorithm performed well on two databases. On the EBU-SQAM database of monophonic sounds, the algorithm achieved an F-measure of 90% while on our database of polyphonic audio an F-measure of 91% was achieved. This technique is being used as a preprocessing step for a tempo analysis algorithm and a TSR (Transients + Sines + Residue) decomposition scheme.
Resumo:
In this paper, we discuss the issues related to word recognition in born-digital word images. We introduce a novel method of power-law transformation on the word image for binarization. We show the improvement in image binarization and the consequent increase in the recognition performance of OCR engine on the word image. The optimal value of gamma for a word image is automatically chosen by our algorithm with fixed stroke width threshold. We have exhaustively experimented our algorithm by varying the gamma and stroke width threshold value. By varying the gamma value, we found that our algorithm performed better than the results reported in the literature. On the ICDAR Robust Reading Systems Challenge-1: Word Recognition Task on born digital dataset, as compared to the recognition rate of 61.5% achieved by TH-OCR after suitable pre-processing by Yang et. al. and 63.4% by ABBYY Fine Reader (used as baseline by the competition organizers without any preprocessing), we achieved 82.9% using Omnipage OCR applied on the images after being processed by our algorithm.
Resumo:
In this paper we present a segmentation algorithm to extract foreground object motion in a moving camera scenario without any preprocessing step such as tracking selected features, video alignment, or foreground segmentation. By viewing it as a curve fitting problem on advected particle trajectories, we use RANSAC to find the polynomial that best fits the camera motion and identify all trajectories that correspond to the camera motion. The remaining trajectories are those due to the foreground motion. By using the superposition principle, we subtract the motion due to camera from foreground trajectories and obtain the true object-induced trajectories. We show that our method performs on par with state-of-the-art technique, with an execution time speed-up of 10x-40x. We compare the results on real-world datasets such as UCF-ARG, UCF Sports and Liris-HARL. We further show that it can be used toper-form video alignment.
Resumo:
Magnetic Resonance Spectroscopy (MRS) offers a unique opportunity to measure brain metabolites in-vivo, and in doing so enables one to understand the brain function and cellular processes implicated in the pathophysiology of psychiatric disorders. MRS, in addition to being non-invasive, is devoid of radioactive tracers and ionizing radiation, a distinct advantage over other imaging modalities like positron emission tomography and single photon emission computed tomography. With advances in MRS technique it is now possible to quantify concentrations of relevant compounds like neurotransmitters, neuronal viability markers and pharmacological compounds. Majority of the MRS studies have examined the neurometabolites in schizophrenia, a common and debilitating psychiatric disorder. Abnormalities in N Acetyl aspartate and Glutamate are consistently reported while the reports regarding the myoinsoitol and choline are inconsistent. These abnormalities are not changed across the illness stages and despite treatment. However, multiple technical challenges have limited the widespread use of MRS in psychiatric disorders. Guidelines for uniform acquisition and preprocessing are need of the hour, which. would increase the replicability and validity of MRS measures in psychiatry. Finally long term, prospective, longitudinal studies are required in different psychiatric disorders for potential clinical applications.
Resumo:
Use of fuel other than woody generally has been limited to rice husk and other residues are rarely tried as a fuel in a gasification system. With the availability of woody biomass in most countries like India, alternates fuels are being explored for sustainable supply of fuel. Use of agro residues has been explored after briquetting. There are few feedstock's like coconut fronts, maize cobs, etc, that might require lesser preprocessing steps compared to briquetting. The paper presents a detailed investigation into using coconut fronds as a fuel in an open top down draft gasification system. The fuel has ash content of 7% and was dried to moisture levels of 12 %. The average bulk density was found to be 230 kg/m3 with a fuel size particle of an average size 40 mm as compared to 350 kg/m3 for a standard wood pieces. A typical dry coconut fronds weighs about 2.5kgs and on an average 6 m long and 90 % of the frond is the petiole which is generally used as a fuel. The focus was also to compare the overall process with respect to operating with a typical woody biomass like subabul whose ash content is 1 %. The open top gasification system consists of a reactor, cooling and cleaning system along with water treatment. The performance parameters studied were the gas composition, tar and particulates in the clean gas, water quality and reactor pressure drop apart from other standard data collection of fuel flow rate, etc. The average gas composition was found to be CO 15 1.0 % H-2 16 +/- 1% CH4 0.5 +/- 0.1 % CO2 12.0 +/- 1.0 % and rest N2 compared to CO 19 +/- 1.0 % H-2 17 +/- 1.0 %, CH4 1 +/- 0.2 %, CO2 12 +/- 1.0 % and rest N2. The tar and particulate content in the clean gas has been found to be about 10 and 12 mg/m3 in both cases. The presence of high ash content material increased the pressure drop with coconut frond compared to woody biomass.
Resumo:
The information-theoretic approach to security entails harnessing the correlated randomness available in nature to establish security. It uses tools from information theory and coding and yields provable security, even against an adversary with unbounded computational power. However, the feasibility of this approach in practice depends on the development of efficiently implementable schemes. In this paper, we review a special class of practical schemes for information-theoretic security that are based on 2-universal hash families. Specific cases of secret key agreement and wiretap coding are considered, and general themes are identified. The scheme presented for wiretap coding is modular and can be implemented easily by including an extra preprocessing layer over the existing transmission codes.
Resumo:
In this work, we describe a system, which recognises open vocabulary, isolated, online handwritten Tamil words and extend it to recognize a paragraph of writing. We explain in detail each step involved in the process: segmentation, preprocessing, feature extraction, classification and bigram-based post-processing. On our database of 45,000 handwritten words obtained through tablet PC, we have obtained symbol level accuracy of 78.5% and 85.3% without and with the usage of post-processing using symbol level language models, respectively. Word level accuracies for the same are 40.1% and 59.6%. A line and word level segmentation strategy is proposed, which gives promising results of 100% line segmentation and 98.1% word segmentation accuracies on our initial trials of 40 handwritten paragraphs. The two modules have been combined to obtain a full-fledged page recognition system for online handwritten Tamil data. To the knowledge of the authors, this is the first ever attempt on recognition of open vocabulary, online handwritten paragraphs in any Indian language.
Resumo:
In this paper we introduce four scenario Cluster based Lagrangian Decomposition (CLD) procedures for obtaining strong lower bounds to the (optimal) solution value of two-stage stochastic mixed 0-1 problems. At each iteration of the Lagrangian based procedures, the traditional aim consists of obtaining the solution value of the corresponding Lagrangian dual via solving scenario submodels once the nonanticipativity constraints have been dualized. Instead of considering a splitting variable representation over the set of scenarios, we propose to decompose the model into a set of scenario clusters. We compare the computational performance of the four Lagrange multiplier updating procedures, namely the Subgradient Method, the Volume Algorithm, the Progressive Hedging Algorithm and the Dynamic Constrained Cutting Plane scheme for different numbers of scenario clusters and different dimensions of the original problem. Our computational experience shows that the CLD bound and its computational effort depend on the number of scenario clusters to consider. In any case, our results show that the CLD procedures outperform the traditional LD scheme for single scenarios both in the quality of the bounds and computational effort. All the procedures have been implemented in a C++ experimental code. A broad computational experience is reported on a test of randomly generated instances by using the MIP solvers COIN-OR and CPLEX for the auxiliary mixed 0-1 cluster submodels, this last solver within the open source engine COIN-OR. We also give computational evidence of the model tightening effect that the preprocessing techniques, cut generation and appending and parallel computing tools have in stochastic integer optimization. Finally, we have observed that the plain use of both solvers does not provide the optimal solution of the instances included in the testbed with which we have experimented but for two toy instances in affordable elapsed time. On the other hand the proposed procedures provide strong lower bounds (or the same solution value) in a considerably shorter elapsed time for the quasi-optimal solution obtained by other means for the original stochastic problem.
Resumo:
Este trabalho de pesquisa descreve três estudos de utilização de métodos quimiométricos para a classificação e caracterização de óleos comestíveis vegetais e seus parâmetros de qualidade através das técnicas de espectrometria de absorção molecular no infravermelho médio com transformada de Fourier e de espectrometria no infravermelho próximo, e o monitoramento da qualidade e estabilidade oxidativa do iogurte usando espectrometria de fluorescência molecular. O primeiro e segundo estudos visam à classificação e caracterização de parâmetros de qualidade de óleos comestíveis vegetais utilizando espectrometria no infravermelho médio com transformada de Fourier (FT-MIR) e no infravermelho próximo (NIR). O algoritmo de Kennard-Stone foi usado para a seleção do conjunto de validação após análise de componentes principais (PCA). A discriminação entre os óleos de canola, girassol, milho e soja foi investigada usando SVM-DA, SIMCA e PLS-DA. A predição dos parâmetros de qualidade, índice de refração e densidade relativa dos óleos, foi investigada usando os métodos de calibração multivariada dos mínimos quadrados parciais (PLS), iPLS e SVM para os dados de FT-MIR e NIR. Vários tipos de pré-processamentos, primeira derivada, correção do sinal multiplicativo (MSC), dados centrados na média, correção do sinal ortogonal (OSC) e variação normal padrão (SNV) foram utilizados, usando a raiz quadrada do erro médio quadrático de validação cruzada (RMSECV) e de predição (RMSEP) como parâmetros de avaliação. A metodologia desenvolvida para determinação de índice de refração e densidade relativa e classificação dos óleos vegetais é rápida e direta. O terceiro estudo visa à avaliação da estabilidade oxidativa e qualidade do iogurte armazenado a 4C submetido à luz direta e mantido no escuro, usando a análise dos fatores paralelos (PARAFAC) na luminescência exibida por três fluoróforos presentes no iogurte, onde pelo menos um deles está fortemente relacionado com as condições de armazenamento. O sinal fluorescente foi identificado pelo espectro de emissão e excitação das substâncias fluorescentes puras, que foram sugeridas serem vitamina A, triptofano e riboflavina. Modelos de regressão baseados nos escores do PARAFAC para a riboflavina foram desenvolvidos usando os escores obtidos no primeiro dia como variável dependente e os escores obtidos durante o armazenamento como variável independente. Foi visível o decaimento da curva analítica com o decurso do tempo da experimentação. Portanto, o teor de riboflavina pode ser considerado um bom indicador para a estabilidade do iogurte. Assim, é possível concluir que a espectroscopia de fluorescência combinada com métodos quimiométricos é um método rápido para monitorar a estabilidade oxidativa e a qualidade do iogurte
Resumo:
Esse trabalho compreende dois diferentes estudos de caso: o primeiro foi a respeito de um medicamento para o qual foi desenvolvida uma metodologia para determinar norfloxacino (NOR) por espectrofluorimetria molecular e validação por HPLC. Primeiramente foi desenvolvida uma metodologia por espectrofluorimetria onde foram feitos alguns testes preliminares a fim de estabelecer qual valor de pH iria fornecer a maior intensidade de emissão. Após fixar o pH foi feita a determinação de NOR em padrões aquosos e soluções do medicamento usando calibração univariada. A faixa de concentração trabalhada foi de 0500 μg.L-1. O limite de detecção para o medicamento foi de 6,9 μg.L-1 enquanto que o de quantificação foi de 24,6 μg.L-1. Além dessas, outras figuras de mérito também foram estimadas para desenvolvimento da metodologia e obtiveram resultados muito satisfatórios, como por exemplo, os testes de recuperação no qual a recuperação do analito foi de 99.5 a 103.8%. Para identificação e quantificação do NOR da urina foi necessário diluir a amostra de urina (estudada em dois diferentes níveis de diluição: 500 e 1000 x) e também uso do método da adição de padrão (na mesma faixa de concentração usada para medicamento). Após a aquisição do espectro, todos foram usados para construção do tensor que seria usado no PARAFAC. Foi possível estimar as figuras de mérito como limite de detecção de 11.4 μg.L-1 and 8.4 μg.L-1 (diluição de 500 e 1000 x respectivamente) e limite de quantificação de 34 μg.L-1 e 25.6 μg.L-1 (diluição de 500 x e 1000 x respectivamente). O segundo estudo de caso foi na área alimentícia no qual se usou espectroscopia NIR e FT MIR acopladas a quimiometria para discriminar óleo de soja transgênica e não transgênica. Os espectros dos óleos não mostraram diferença significativa em termos visuais, sendo necessário usar ferramentas quimiométricas capazes de fazer essa distinção. Tanto para espectroscopia NIR quanto FT MIR foi feito o PCA a fim de identificar amostras discrepantes e que influenciariam o modelo de forma negativa. Após efetuar o PCA, foram usadas três diferentes técnicas para discriminar os óleos: SIMCA, SVM-DA e PLS-DA, sendo que para cada técnica foram usados também diferentes pré processamento. No NIR, apenas para um pré processamento se obteve resultados satisfatórios nas três técnicas, enquanto que para FT-MIR ao se usar PLS-DA se obteve 100% de acerto na classificação para todos os pré processamentos
Resumo:
The main contribution of this work is to analyze and describe the state of the art performance as regards answer scoring systems from the SemEval- 2013 task, as well as to continue with the development of an answer scoring system (EHU-ALM) developed in the University of the Basque Country. On the overall this master thesis focuses on finding any possible configuration that lets improve the results in the SemEval dataset by using attribute engineering techniques in order to find optimal feature subsets, along with trying different hierarchical configurations in order to analyze its performance against the traditional one versus all approach. Altogether, throughout the work we propose two alternative strategies: on the one hand, to improve the EHU-ALM system without changing the architecture, and, on the other hand, to improve the system adapting it to an hierarchical con- figuration. To build such new models we describe and use distinct attribute engineering, data preprocessing, and machine learning techniques.
Resumo:
O biodiesel tem sido amplamente utilizado como uma fonte de energia renovável, que contribui para a diminuição de demanda por diesel mineral. Portanto, existem várias propriedades que devem ser monitoradas, a fim de produzir e distribuir biodiesel com a qualidade exigida. Neste trabalho, as propriedades físicas do biodiesel, tais como massa específica, índice de refração e ponto de entupimento de filtro a frio foram medidas e associadas a espectrometria no infravermelho próximo (NIR) e espectrometria no infravermelho médio (Mid-IR) utilizando ferramentas quimiométricas. Os métodos de regressão por mínimos quadrados parciais (PLS), regressão de mínimos quadrados parciais por intervalos (iPLS), e regressão por máquinas de vetor de suporte (SVM) com seleção de variáveis por Algoritmo Genético (GA) foram utilizadas para modelar as propriedades mencionadas. As amostras de biodiesel foram sintetizadas a partir de diferentes fontes, tais como canola, girassol, milho e soja. Amostras adicionais de biodiesel foram adquiridas de um fornecedor da região sul do Brasil. Em primeiro lugar, o pré-processamento de correção de linha de base foi usado para normalizar os dados espectrais de NIR, seguidos de outros tipos de pré-processamentos que foram aplicados, tais como centralização dos dados na média, 1 derivada e variação de padrão normal. O melhor resultado para a previsão do ponto de entupimento de filtro a frio foi utilizando os espectros de Mid-IR e o método de regressão GA-SVM, com alto coeficiente de determinação da previsão, R2Pred=0,96 e baixo valor da Raiz Quadrada do Erro Médio Quadrático da previsão, RMSEP (C)= 0,6. Para o modelo de previsão da massa específica, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com R2Pred=0,98 e RMSEP (g/cm3)= 0,0002. Quanto ao modelo de previsão para o índice de refração, o melhor resultado foi obtido utilizando os espectros de Mid-IR e regressão por PLS, com excelente R2Pred=0,98 e RMSEP= 0,0001. Para esses conjuntos de dados, o PLS e o SVM demonstraram sua robustez, apresentando-se como ferramentas úteis para a previsão das propriedades do biodiesel estudadas