31 resultados para on-disk data layout
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
In this paper we exploit the nonlinear property of the SiC multilayer devices to design an optical processor for error detection that enables reliable delivery of spectral data of four-wave mixing over unreliable communication channels. The SiC optical processor is realized by using double pin/pin a-SiC:H photodetector with front and back biased optical gating elements. Visible pulsed signals are transmitted together at different bit sequences. The combined optical signal is analyzed. Data show that the background acts as selector that picks one or more states by splitting portions of the input multi optical signals across the front and back photodiodes. Boolean operations such as EXOR and three bit addition are demonstrated optically, showing that when one or all of the inputs are present, the system will behave as an XOR gate representing the SUM. When two or three inputs are on, the system acts as AND gate indicating the present of the CARRY bit. Additional parity logic operations are performed using four incoming pulsed communication channels that are transmitted and checked for errors together. As a simple example of this approach, we describe an all-optical processor for error detection and then provide an experimental demonstration of this idea. (C) 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Resumo:
In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
Resumo:
Dimensionality reduction plays a crucial role in many hyperspectral data processing and analysis algorithms. This paper proposes a new mean squared error based approach to determine the signal subspace in hyperspectral imagery. The method first estimates the signal and noise correlations matrices, then it selects the subset of eigenvalues that best represents the signal subspace in the least square sense. The effectiveness of the proposed method is illustrated using simulated and real hyperspectral images.
Resumo:
Mestrado em Contabilidade e Gestão das Instituições Financeiras
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.
Resumo:
Experimental optoelectronic characterization of a p-i'(a-SiC:H)-n/pi(a-Si:H)-n heterostructure with low conductivity doped layers shows the feasibility of tailoring channel bandwidth and wavelength by optical bias through back and front side illumination. Front background enhances light-to-dark sensitivity of the long and medium wavelength range, and strongly quenches the others. Back violet background enhances the magnitude in short wavelength range and reduces the others. Experiments have three distinct programmed time slots: control, hibernation and data. Throughout the control time slot steady light wavelengths illuminate either or both sides of the device, followed by the hibernation without any background illumination. The third time slot allows a programmable sequence of different wavelengths with an impulse frequency of 6000Hz to shine upon the sensor. Results show that the control time slot illumination has an influence on the data time slot which is used as a volatile memory with the set, reset logical functions. © IFIP International Federation for Information Processing 2015.
Resumo:
The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.
Resumo:
A liberalização do sector eléctrico, e a consequente criação de mercados de energia eléctrica regulados e liberalizados, mudou a forma de comercialização da electricidade. Em particular, permitiu a entrada de empresas nas actividades de produção e comercialização, aumentando a competitividade e assegurando a liberdade de escolha dos consumidores, para decidir o fornecedor de electricidade que pretenderem. A competitividade no sector eléctrico aumentou a necessidade das empresas que o integram a proporem preços mais aliciantes (do que os preços propostos pelos concorrentes), e contribuiu para o desenvolvimento de estratégias de mercado que atraiam mais clientes e aumentem a eficiência energética e económica. A comercialização de electricidade pode ser realizada em mercados organizados ou através de contratação directa entre comercializadores e consumidores, utilizando os contratos bilaterais físicos. Estes contratos permitem a negociação dos preços de electricidade entre os comercializadores e os consumidores. Actualmente, existem várias ferramentas computacionais para fazer a simulação de mercados de energia eléctrica. Os simuladores existentes permitem simulações de transacções em bolsas de energia, negociação de preços através de contratos bilaterais, e análises técnicas a redes de energia. No entanto, devido à complexidade dos sistemas eléctricos, esses simuladores apresentam algumas limitações. Esta dissertação apresenta um simulador de contratos bilaterais em mercados de energia eléctrica, sendo dando ênfase a um protocolo de ofertas alternadas, desenvolvido através da tecnologia multi-agente. Em termos sucintos, um protocolo de ofertas alternadas é um protocolo de interacção que define as regras da negociação entre um agente vendedor (por exemplo um retalhista) e um agente comprador (por exemplo um consumidor final). Aplicou-se o simulador na resolução de um caso prático, baseado em dados reais. Os resultados obtidos permitem concluir que o simulador, apesar de simplificado, pode ser uma ferramenta importante na ajuda à tomada de decisões inerentes à negociação de contratos bilaterais em mercados de electricidade.
Resumo:
Financial literature and financial industry use often zero coupon yield curves as input for testing hypotheses, pricing assets or managing risk. They assume this provided data as accurate. We analyse implications of the methodology and of the sample selection criteria used to estimate the zero coupon bond yield term structure on the resulting volatility of spot rates with different maturities. We obtain the volatility term structure using historical volatilities and Egarch volatilities. As input for these volatilities we consider our own spot rates estimation from GovPX bond data and three popular interest rates data sets: from the Federal Reserve Board, from the US Department of the Treasury (H15), and from Bloomberg. We find strong evidence that the resulting zero coupon bond yield volatility estimates as well as the correlation coefficients among spot and forward rates depend significantly on the data set. We observe relevant differences in economic terms when volatilities are used to price derivatives.
Resumo:
A exposição a formaldeído é reconhecidamente um dos mais importantes factores de risco presente nos laboratórios hospitalares de anatomia patológica. Neste contexto ocupacional, o formaldeído é utilizado em solução, designada comummente por formol. Trata-se de uma solução comercial de formaldeído, normalmente diluída a 10%, sendo pouco onerosa e, por esse motivo, a eleita para os trabalhos de rotina em anatomia patológica. A solução é utilizada como fixador e conservante do material biológico, pelo que as peças anatómicas a serem processadas são previamente impregnadas. No que concerne aos efeitos cancerígenos, a primeira avaliação efectuada pela International Agency for Research on Cancer data de 1981, actualizada em 1982, 1987, 1995 e 2004, considerando-o como um agente cancerígeno do grupo 2A (provavelmente carcinogénico). No entanto, a mais recente avaliação, em 2006, considera o formaldeído no Grupo 1 (agente carcinogénico) com base na evidência de que a exposição a este agente é susceptível de causar cancro nasofaríngeo em humanos. Constituiu objectivo principal do estudo desenvolvido caracterizar a exposição profissional a formaldeído em laboratórios hospitalares de anatomia patológica. O estudo incidiu sobre 10 laboratórios hospitalares de anatomia patológica situados em Portugal Continental. Foi avaliada a exposição dos trabalhadores considerando três grupos profissionais (Técnicos de Anatomia Patológica, Médicos Anatomo-Patologistas e Auxiliares) por comparação com dois referenciais de exposição (VLE-CM e VLE-MP) e, ainda, considerados os valores de concentração máxima em 83 actividades desenvolvidas nos laboratórios pertencentes à amostra. Foram aplicados simultaneamente dois métodos distintos de avaliação ambiental: um dos métodos (Método 1) fez uso de um equipamento de leitura directa com o princípio de medição por Photo Ionization Detection, com uma lâmpada de 11,7 eV, realizando-se, simultaneamente, o registo da actividade de trabalho − foram assim obtidos dados para o referencial de exposição da concentração máxima (CM); o outro método (Método 2) traduziu-se na aplicação do método NIOSH 2541, implicando o uso de bombas de amostragem eléctricas de baixo caudal e posterior processamento analítico das amostras por cromatografia gasosa − este método, por sua vez, facultou dados para o referencial de exposição da concentração média ponderada (CMP). A aplicação simultânea dos dois métodos de avaliação ambiental resultou na obtenção de resultados distintos, mas não contraditórios, no que concerne à avaliação da exposição profissional a formaldeído. Para as actividades estudadas (n=83) verificou-se que cerca de 93% dos valores são superiores ao valor limite de exposição definido para a concentração máxima (VLE-CM=0,3 ppm). O “exame macroscópico” foi a actividade mais estudada e onde se verificou a maior prevalência de resultados superiores ao valor limite (92,8%). O valor médio mais elevado da concentração máxima (2,04 ppm) verificou-se no grupo de exposição dos Técnicos de Anatomia Oatológica. No entanto, a maior amplitude de resultados observou-se no grupo dos Médicos Anatomo-Patologistas (0,21 ppm a 5,02 ppm). No que respeita ao referencial da Concentração Média Ponderada, todos os valores obtidos nos 10 laboratórios estudados para os três grupos de exposição foram inferiores ao valor limite de exposição definido pela Occupational Safety and Health Administration (TLV-TWA=0,75 ppm). Dado não se perspectivar a curto prazo a eliminação do formaldeído, devido ao grande número de actividades que envolvem ainda a utilização da sua solução comercial (formol), pode concluir-se que a exposição a este agente neste contexto ocupacional específico é preocupante, carecendo de uma intervenção rápida com o objectivo de minimizar a exposição e prevenir os potenciais efeitos para a saúde dos trabalhadores expostos.
Resumo:
Mestrado em Fisioterapia.
Resumo:
Audiometer systems provide enormous amounts of detailed TV watching data. Several relevant and interdependent factors may influence TV viewers' behavior. In this work we focus on the time factor and derive Temporal Patterns of TV watching, based on panel data. Clustering base attributes are originated from 1440 binary minute-related attributes, capturing the TV watching status (watch/not watch). Since there are around 2500 panel viewers a data reduction procedure is first performed. K-Means algorithm is used to obtain daily clusters of viewers. Weekly patterns are then derived which rely on daily patterns. The obtained solutions are tested for consistency and stability. Temporal TV watching patterns provide new insights concerning Portuguese TV viewers' behavior.
Resumo:
Objectives - Review available guidance for quality assurance (QA) in mammography and discuss its contribution to harmonise practices worldwide. Methods - Literature search was performed on different sources to identify guidance documents for QA in mammography available worldwide in international bodies, healthcare providers, professional/scientific associations. The guidance documents identified were reviewed and a selection was compared for type of guidance (clinical/technical), technology and proposed QA methodologies focusing on dose and image quality (IQ) performance assessment. Results - Fourteen protocols (targeted at conventional and digital mammography) were reviewed. All included recommendations for testing acquisition, processing and display systems associated with mammographic equipment. All guidance reviewed highlighted the importance of dose assessment and testing the Automatic Exposure Control (AEC) system. Recommended tests for assessment of IQ showed variations in the proposed methodologies. Recommended testing focused on assessment of low-contrast detection, spatial resolution and noise. QC of image display is recommended following the American Association of Physicists in Medicine guidelines. Conclusions - The existing QA guidance for mammography is derived from key documents (American College of Radiology and European Union guidelines) and proposes similar tests despite the variations in detail and methodologies. Studies reported on QA data should provide detail on experimental technique to allow robust data comparison. Countries aiming to implement a mammography/QA program may select/prioritise the tests depending on available technology and resources.
Resumo:
Trabalho Final de Mestrado para obtenção do grau de Mestre em Engenharia Civil na Área de Especialização em Edificações