969 resultados para on-disk data layout


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Future distribution systems will have to deal with an intensive penetration of distributed energy resources ensuring reliable and secure operation according to the smart grid paradigm. SCADA (Supervisory Control and Data Acquisition) is an essential infrastructure for this evolution. This paper proposes a new conceptual design of an intelligent SCADA with a decentralized, flexible, and intelligent approach, adaptive to the context (context awareness). This SCADA model is used to support the energy resource management undertaken by a distribution network operator (DNO). Resource management considers all the involved costs, power flows, and electricity prices, allowing the use of network reconfiguration and load curtailment. Locational Marginal Prices (LMP) are evaluated and used in specific situations to apply Demand Response (DR) programs on a global or a local basis. The paper includes a case study using a 114 bus distribution network and load demand based on real data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The characteristics of carbon fibre reinforced laminates had widened their use, from aerospace to domestic appliances. A common characteristic is the need of drilling for assembly purposes. It is known that a drilling process that reduces the drill thrust force can decrease the risk of delamination. In this work, delamination assessment methods based on radiographic data are compared and correlated with mechanical test results (bearing test).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Climatic reconstructions based on palynological data from Aquitaine outcrops emphasize an important degradation phase during the Lower Serravallian. Climatic and environmental changes can be related to sea-level variations (Bur 5 / Lan 1, Lan 2 / Ser 1 and Ser 2 cycles). Transgressive phases feature warmer conditions and more open environments whereas regressive phases are marked by a cooler climate and an extent of the forest cover. From Langhian to Middle Serravallian, a general cooling is highlighted, with disappearance of most megathermic taxa and a transition from warm and dry climate to warm-temperate and much more humid conditions. Conclusions are consistent with studies on bordering areas and place the major degradation phase around 14 My. The palynologic data allow filling a gap in the climatic evolution of Southern France, as a connection between Lower and Upper Miocene, both well recorded. These results document, on Western Europe scale, latitudinal climatic gradient across Northern hemisphere while featuring a transition between Mediterranean area and northeastern Atlantic frontage.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Trabalho apresentado no âmbito do Mestrado em Engenharia Informática, como requisito parcial para obtenção do grau de Mestre em Engenharia Informática.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Experimental optoelectronic characterization of a p-i'(a-SiC:H)-n/pi(a-Si:H)-n heterostructure with low conductivity doped layers shows the feasibility of tailoring channel bandwidth and wavelength by optical bias through back and front side illumination. Front background enhances light-to-dark sensitivity of the long and medium wavelength range, and strongly quenches the others. Back violet background enhances the magnitude in short wavelength range and reduces the others. Experiments have three distinct programmed time slots: control, hibernation and data. Throughout the control time slot steady light wavelengths illuminate either or both sides of the device, followed by the hibernation without any background illumination. The third time slot allows a programmable sequence of different wavelengths with an impulse frequency of 6000Hz to shine upon the sensor. Results show that the control time slot illumination has an influence on the data time slot which is used as a volatile memory with the set, reset logical functions. © IFIP International Federation for Information Processing 2015.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The development of high spatial resolution airborne and spaceborne sensors has improved the capability of ground-based data collection in the fields of agriculture, geography, geology, mineral identification, detection [2, 3], and classification [4–8]. The signal read by the sensor from a given spatial element of resolution and at a given spectral band is a mixing of components originated by the constituent substances, termed endmembers, located at that element of resolution. This chapter addresses hyperspectral unmixing, which is the decomposition of the pixel spectra into a collection of constituent spectra, or spectral signatures, and their corresponding fractional abundances indicating the proportion of each endmember present in the pixel [9, 10]. Depending on the mixing scales at each pixel, the observed mixture is either linear or nonlinear [11, 12]. The linear mixing model holds when the mixing scale is macroscopic [13]. The nonlinear model holds when the mixing scale is microscopic (i.e., intimate mixtures) [14, 15]. The linear model assumes negligible interaction among distinct endmembers [16, 17]. The nonlinear model assumes that incident solar radiation is scattered by the scene through multiple bounces involving several endmembers [18]. Under the linear mixing model and assuming that the number of endmembers and their spectral signatures are known, hyperspectral unmixing is a linear problem, which can be addressed, for example, under the maximum likelihood setup [19], the constrained least-squares approach [20], the spectral signature matching [21], the spectral angle mapper [22], and the subspace projection methods [20, 23, 24]. Orthogonal subspace projection [23] reduces the data dimensionality, suppresses undesired spectral signatures, and detects the presence of a spectral signature of interest. The basic concept is to project each pixel onto a subspace that is orthogonal to the undesired signatures. As shown in Settle [19], the orthogonal subspace projection technique is equivalent to the maximum likelihood estimator. This projection technique was extended by three unconstrained least-squares approaches [24] (signature space orthogonal projection, oblique subspace projection, target signature space orthogonal projection). Other works using maximum a posteriori probability (MAP) framework [25] and projection pursuit [26, 27] have also been applied to hyperspectral data. In most cases the number of endmembers and their signatures are not known. Independent component analysis (ICA) is an unsupervised source separation process that has been applied with success to blind source separation, to feature extraction, and to unsupervised recognition [28, 29]. ICA consists in finding a linear decomposition of observed data yielding statistically independent components. Given that hyperspectral data are, in given circumstances, linear mixtures, ICA comes to mind as a possible tool to unmix this class of data. In fact, the application of ICA to hyperspectral data has been proposed in reference 30, where endmember signatures are treated as sources and the mixing matrix is composed by the abundance fractions, and in references 9, 25, and 31–38, where sources are the abundance fractions of each endmember. In the first approach, we face two problems: (1) The number of samples are limited to the number of channels and (2) the process of pixel selection, playing the role of mixed sources, is not straightforward. In the second approach, ICA is based on the assumption of mutually independent sources, which is not the case of hyperspectral data, since the sum of the abundance fractions is constant, implying dependence among abundances. This dependence compromises ICA applicability to hyperspectral images. In addition, hyperspectral data are immersed in noise, which degrades the ICA performance. IFA [39] was introduced as a method for recovering independent hidden sources from their observed noisy mixtures. IFA implements two steps. First, source densities and noise covariance are estimated from the observed data by maximum likelihood. Second, sources are reconstructed by an optimal nonlinear estimator. Although IFA is a well-suited technique to unmix independent sources under noisy observations, the dependence among abundance fractions in hyperspectral imagery compromises, as in the ICA case, the IFA performance. Considering the linear mixing model, hyperspectral observations are in a simplex whose vertices correspond to the endmembers. Several approaches [40–43] have exploited this geometric feature of hyperspectral mixtures [42]. Minimum volume transform (MVT) algorithm [43] determines the simplex of minimum volume containing the data. The MVT-type approaches are complex from the computational point of view. Usually, these algorithms first find the convex hull defined by the observed data and then fit a minimum volume simplex to it. Aiming at a lower computational complexity, some algorithms such as the vertex component analysis (VCA) [44], the pixel purity index (PPI) [42], and the N-FINDR [45] still find the minimum volume simplex containing the data cloud, but they assume the presence in the data of at least one pure pixel of each endmember. This is a strong requisite that may not hold in some data sets. In any case, these algorithms find the set of most pure pixels in the data. Hyperspectral sensors collects spatial images over many narrow contiguous bands, yielding large amounts of data. For this reason, very often, the processing of hyperspectral data, included unmixing, is preceded by a dimensionality reduction step to reduce computational complexity and to improve the signal-to-noise ratio (SNR). Principal component analysis (PCA) [46], maximum noise fraction (MNF) [47], and singular value decomposition (SVD) [48] are three well-known projection techniques widely used in remote sensing in general and in unmixing in particular. The newly introduced method [49] exploits the structure of hyperspectral mixtures, namely the fact that spectral vectors are nonnegative. The computational complexity associated with these techniques is an obstacle to real-time implementations. To overcome this problem, band selection [50] and non-statistical [51] algorithms have been introduced. This chapter addresses hyperspectral data source dependence and its impact on ICA and IFA performances. The study consider simulated and real data and is based on mutual information minimization. Hyperspectral observations are described by a generative model. This model takes into account the degradation mechanisms normally found in hyperspectral applications—namely, signature variability [52–54], abundance constraints, topography modulation, and system noise. The computation of mutual information is based on fitting mixtures of Gaussians (MOG) to data. The MOG parameters (number of components, means, covariances, and weights) are inferred using the minimum description length (MDL) based algorithm [55]. We study the behavior of the mutual information as a function of the unmixing matrix. The conclusion is that the unmixing matrix minimizing the mutual information might be very far from the true one. Nevertheless, some abundance fractions might be well separated, mainly in the presence of strong signature variability, a large number of endmembers, and high SNR. We end this chapter by sketching a new methodology to blindly unmix hyperspectral data, where abundance fractions are modeled as a mixture of Dirichlet sources. This model enforces positivity and constant sum sources (full additivity) constraints. The mixing matrix is inferred by an expectation-maximization (EM)-type algorithm. This approach is in the vein of references 39 and 56, replacing independent sources represented by MOG with mixture of Dirichlet sources. Compared with the geometric-based approaches, the advantage of this model is that there is no need to have pure pixels in the observations. The chapter is organized as follows. Section 6.2 presents a spectral radiance model and formulates the spectral unmixing as a linear problem accounting for abundance constraints, signature variability, topography modulation, and system noise. Section 6.3 presents a brief resume of ICA and IFA algorithms. Section 6.4 illustrates the performance of IFA and of some well-known ICA algorithms with experimental data. Section 6.5 studies the ICA and IFA limitations in unmixing hyperspectral data. Section 6.6 presents results of ICA based on real data. Section 6.7 describes the new blind unmixing scheme and some illustrative examples. Section 6.8 concludes with some remarks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper consists in the characterization of medium voltage (MV) electric power consumers based on a data clustering approach. It is intended to identify typical load profiles by selecting the best partition of a power consumption database among a pool of data partitions produced by several clustering algorithms. The best partition is selected using several cluster validity indices. These methods are intended to be used in a smart grid environment to extract useful knowledge about customers’ behavior. The data-mining-based methodology presented throughout the paper consists in several steps, namely the pre-processing data phase, clustering algorithms application and the evaluation of the quality of the partitions. To validate our approach, a case study with a real database of 1.022 MV consumers was used.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper presents the applicability of a reinforcement learning algorithm based on the application of the Bayesian theorem of probability. The proposed reinforcement learning algorithm is an advantageous and indispensable tool for ALBidS (Adaptive Learning strategic Bidding System), a multi-agent system that has the purpose of providing decision support to electricity market negotiating players. ALBidS uses a set of different strategies for providing decision support to market players. These strategies are used accordingly to their probability of success for each different context. The approach proposed in this paper uses a Bayesian network for deciding the most probably successful action at each time, depending on past events. The performance of the proposed methodology is tested using electricity market simulations in MASCEM (Multi-Agent Simulator of Competitive Electricity Markets). MASCEM provides the means for simulating a real electricity market environment, based on real data from real electricity market operators.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Worldwide electricity markets have been evolving into regional and even continental scales. The aim at an efficient use of renewable based generation in places where it exceeds the local needs is one of the main reasons. A reference case of this evolution is the European Electricity Market, where countries are connected, and several regional markets were created, each one grouping several countries, and supporting transactions of huge amounts of electrical energy. The continuous transformations electricity markets have been experiencing over the years create the need to use simulation platforms to support operators, regulators, and involved players for understanding and dealing with this complex environment. This paper focuses on demonstrating the advantage that real electricity markets data has for the creation of realistic simulation scenarios, which allow the study of the impacts and implications that electricity markets transformations will bring to the participant countries. A case study using MASCEM (Multi-Agent System for Competitive Electricity Markets) is presented, with a scenario based on real data, simulating the European Electricity Market environment, and comparing its performance when using several different market mechanisms.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Neste documento ´e feita a descrição detalhada da integração modular de um script no software OsiriX. O objectivo deste script ´e determinar o diâmetro central da artéria aorta a partir de uma Tomografia Computorizada. Para tal são abordados conceitos relacionados com a temática do processamento de imagem digital, tecnologias associadas, e.g., a norma DICOM e desenvolvimento de software. Como estudo preliminar, são analisados diversos visualizadores de imagens médica, utilizados para investigação ou mesmo comercializados. Foram realizadas duas implementações distintas do plugin. A primeira versão do plugin faz a invocação do script de processamento usando o ficheiro de estudo armazenado em disco; a segunda versão faz a passagem de dados através de um bloco de memória partilhada e utiliza o framework Java Native Interface. Por fim, é demonstrado todo o processo de aposição da Marcação CE de um dispositivo médico de classe IIa e obtenção da declaração de conformidade por parte de um Organismo Notificado. Utilizaram-se os Sistemas Operativos Mac OS X e Linux e as linguagens de programação Java, Objective-C e Python.