28 resultados para Reserve Selection Algorithms

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Reclaimed water from small wastewater treatment facilities in the rural areas of the Beira Interior region (Portugal) may constitute an alternative water source for aquifer recharge. A 21-month monitoring period in a constructed wetland treatment system has shown that 21,500 m(3) year(-1) of treated wastewater (reclaimed water) could be used for aquifer recharge. A GIS-based multi-criteria analysis was performed, combining ten thematic maps and economic, environmental and technical criteria, in order to produce a suitability map for the location of sites for reclaimed water infiltration. The areas chosen for aquifer recharge with infiltration basins are mainly composed of anthrosol with more than 1 m deep and fine sand texture, which allows an average infiltration velocity of up to 1 m d(-1). These characteristics will provide a final polishing treatment of the reclaimed water after infiltration (soil aquifer treatment (SAT)), suitable for the removal of the residual load (trace organics, nutrients, heavy metals and pathogens). The risk of groundwater contamination is low since the water table in the anthrosol areas ranges from 10 m to 50 m. Oil the other hand, these depths allow a guaranteed unsaturated area suitable for SAT. An area of 13,944 ha was selected for study, but only 1607 ha are suitable for reclaimed water infiltration. Approximately 1280 m(2) were considered enough to set up 4 infiltration basins to work in flooding and drying cycles.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Financial literature and financial industry use often zero coupon yield curves as input for testing hypotheses, pricing assets or managing risk. They assume this provided data as accurate. We analyse implications of the methodology and of the sample selection criteria used to estimate the zero coupon bond yield term structure on the resulting volatility of spot rates with different maturities. We obtain the volatility term structure using historical volatilities and Egarch volatilities. As input for these volatilities we consider our own spot rates estimation from GovPX bond data and three popular interest rates data sets: from the Federal Reserve Board, from the US Department of the Treasury (H15), and from Bloomberg. We find strong evidence that the resulting zero coupon bond yield volatility estimates as well as the correlation coefficients among spot and forward rates depend significantly on the data set. We observe relevant differences in economic terms when volatilities are used to price derivatives.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This work aims at investigating the impact of treating breast cancer using different radiation therapy (RT) techniques – forwardly-planned intensity-modulated, f-IMRT, inversely-planned IMRT and dynamic conformal arc (DCART) RT – and their effects on the whole-breast irradiation and in the undesirable irradiation of the surrounding healthy tissues. Two algorithms of iPlan BrainLAB treatment planning system were compared: Pencil Beam Convolution (PBC) and commercial Monte Carlo (iMC). Seven left-sided breast patients submitted to breast-conserving surgery were enrolled in the study. For each patient, four RT techniques – f-IMRT, IMRT using 2-fields and 5-fields (IMRT2 and IMRT5, respectively) and DCART – were applied. The dose distributions in the planned target volume (PTV) and the dose to the organs at risk (OAR) were compared analyzing dose–volume histograms; further statistical analysis was performed using IBM SPSS v20 software. For PBC, all techniques provided adequate coverage of the PTV. However, statistically significant dose differences were observed between the techniques, in the PTV, OAR and also in the pattern of dose distribution spreading into normal tissues. IMRT5 and DCART spread low doses into greater volumes of normal tissue, right breast, right lung and heart than tangential techniques. However, IMRT5 plans improved distributions for the PTV, exhibiting better conformity and homogeneity in target and reduced high dose percentages in ipsilateral OAR. DCART did not present advantages over any of the techniques investigated. Differences were also found comparing the calculation algorithms: PBC estimated higher doses for the PTV, ipsilateral lung and heart than the iMC algorithm predicted.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação de Mestrado para obtenção do grau de Mestre em Engenharia Mecânica Ramo de Manutenção e Produção

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Dissertação para obtenção do grau de Mestre em Engenharia Informática

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Informática e de Computadores

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper proposes a stochastic mixed-integer linear approach to deal with a short-term unit commitment problem with uncertainty on a deregulated electricity market that includes day-ahead bidding and bilateral contracts. The proposed approach considers the typically operation constraints on the thermal units and a spinning reserve. The uncertainty is due to the electricity prices, which are modeled by a scenario set, allowing an acceptable computation. Moreover, emission allowances are considered in a manner to allow for the consideration of environmental constraints. A case study to illustrate the usefulness of the proposed approach is presented and an assessment of the cost for the spinning reserve is obtained by a comparison between the situation with and without spinning reserve.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Electricity short-term load forecast is very important for the operation of power systems. In this work a classical exponential smoothing model, the Holt-Winters with double seasonality was used to test for accurate predictions applied to the Portuguese demand time series. Some metaheuristic algorithms for the optimal selection of the smoothing parameters of the Holt-Winters forecast function were used and the results after testing in the time series showed little differences among methods, so the use of the simple local search algorithms is recommended as they are easier to implement.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Electricity short-term load forecast is very important for the operation of power systems. In this work a classical exponential smoothing model, the Holt-Winters with double seasonality was used to test for accurate predictions applied to the Portuguese demand time series. Some metaheuristic algorithms for the optimal selection of the smoothing parameters of the Holt-Winters forecast function were used and the results after testing in the time series showed little differences among methods, so the use of the simple local search algorithms is recommended as they are easier to implement.