21 resultados para SAMPLE SELECTION
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Financial literature and financial industry use often zero coupon yield curves as input for testing hypotheses, pricing assets or managing risk. They assume this provided data as accurate. We analyse implications of the methodology and of the sample selection criteria used to estimate the zero coupon bond yield term structure on the resulting volatility of spot rates with different maturities. We obtain the volatility term structure using historical volatilities and Egarch volatilities. As input for these volatilities we consider our own spot rates estimation from GovPX bond data and three popular interest rates data sets: from the Federal Reserve Board, from the US Department of the Treasury (H15), and from Bloomberg. We find strong evidence that the resulting zero coupon bond yield volatility estimates as well as the correlation coefficients among spot and forward rates depend significantly on the data set. We observe relevant differences in economic terms when volatilities are used to price derivatives.
Resumo:
Dissertação apresentada para obtenção do grau de Mestre em Ciências da Educação Área de especialização em Intervenção Precoce
Resumo:
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.
Resumo:
Reclaimed water from small wastewater treatment facilities in the rural areas of the Beira Interior region (Portugal) may constitute an alternative water source for aquifer recharge. A 21-month monitoring period in a constructed wetland treatment system has shown that 21,500 m(3) year(-1) of treated wastewater (reclaimed water) could be used for aquifer recharge. A GIS-based multi-criteria analysis was performed, combining ten thematic maps and economic, environmental and technical criteria, in order to produce a suitability map for the location of sites for reclaimed water infiltration. The areas chosen for aquifer recharge with infiltration basins are mainly composed of anthrosol with more than 1 m deep and fine sand texture, which allows an average infiltration velocity of up to 1 m d(-1). These characteristics will provide a final polishing treatment of the reclaimed water after infiltration (soil aquifer treatment (SAT)), suitable for the removal of the residual load (trace organics, nutrients, heavy metals and pathogens). The risk of groundwater contamination is low since the water table in the anthrosol areas ranges from 10 m to 50 m. Oil the other hand, these depths allow a guaranteed unsaturated area suitable for SAT. An area of 13,944 ha was selected for study, but only 1607 ha are suitable for reclaimed water infiltration. Approximately 1280 m(2) were considered enough to set up 4 infiltration basins to work in flooding and drying cycles.
Resumo:
Introdução – A mamografia é o principal método de diagnóstico por imagem utilizado no rastreio e diagnóstico do cancro da mama, sendo a modalidade de imagem recomendada em vários países da Europa e Estados Unidos para utilização em programas de rastreio. A implementação da tecnologia digital causou alterações na prática da mamografia, nomeadamente a necessidade de adaptar os programas de controlo de qualidade. Objetivos – Caracterizar a tecnologia instalada para mamografia em Portugal e as práticas adotadas na sua utilização pelos profissionais de saúde envolvidos. Concluir sobre o nível de harmonização das práticas em mamografia em Portugal e a conformidade com as recomendações internacionais. Identificar oportunidades para otimização que permitam assegurar a utilização eficaz e segura da tecnologia. Metodologia – Pesquisa e recolha de dados sobre a tecnologia instalada, fornecidos por fontes governamentais, prestadores de serviços de mamografia e indústria. Construção de três questionários, orientados ao perfil do médico radiologista, técnico de radiologia com atividade em mamografia digital e técnico de radiologia coordenador. Os questionários foram aplicados em 65 prestadores de serviços de mamografia selecionados com base em critérios de localização geográfica, tipo de tecnologia instalada e perfil da instituição. Resultados – Foram identificados 441 sistemas para mamografia em Portugal. A tecnologia mais frequente (62%) e vulgarmente conhecida por radiografia computorizada (computed radiography) é constituída por um detector (image plate) de material fotoestimulável inserido numa cassete de suporte e por um sistema de processamento ótico. A maioria destes sistemas (78%) está instalada em prestadores privados. Aproximadamente 12% dos equipamentos instalados são sistemas para radiografia digital direta (Direct Digital Radiography – DDR). Os critérios para seleção dos parâmetros técnicos de exposição variam, observando-se que em 65% das instituições são adotadas as recomendações dos fabricantes do equipamento. As ferramentas de pós-processamento mais usadas pelos médicos radiologistas são o ajuste do contraste e brilho e magnificação total e/ou localizada da imagem. Quinze instituições (em 19) têm implementado um programa de controlo de qualidade. Conclusões – Portugal apresenta um parque de equipamentos heterogéneo que inclui tecnologia obsoleta e tecnologia “topo de gama”. As recomendações/guidelines (europeias ou americanas) não são adotadas formalmente na maioria das instituições como guia para fundamentação das práticas em mamografia, dominando as recomendações dos fabricantes do equipamento. Foram identificadas, pelos técnicos de radiologia e médicos radiologistas, carências de formação especializada, nomeadamente nas temáticas da intervenção mamária, otimização da dose e controlo da qualidade. A maioria dos inquiridos concorda com a necessidade de certificação da prática da mamografia em Portugal e participaria num programa voluntário. ABSTRACT - Introduction – Mammography is the gold standard for screening and imaging diagnosis of breast disease. It is the imaging modality recommended by screening programs in various countries in Europe and the United States. The implementation of the digital technology promoted changes in mammography practice and triggered the need to adjust quality control programs. Aims –Characterize the technology for mammography installed in Portugal. Assess practice in use in mammography and its harmonization and compliance to international guidelines. Identify optimization needs to promote an effective and efficient use of digital mammography to full potential. Methodology – Literature review was performed. Data was collected from official sources (governmental bodies, mammography healthcare providers and medical imaging industry) regarding the number and specifications of mammography equipment installed in Portugal. Three questionnaires targeted at radiologists, breast radiographers and the chief-radiographer were designed for data collection on the technical and clinical practices in mammography. The questionnaires were delivered in a sample of 65 mammography providers selected according to geographical criteria, type of technology and institution profile. Results – Results revealed 441 mammography systems installed in Portugal. The most frequent (62%) technology type are computerized systems (CR) mostly installed in the private sector (78%). 12% are direct radiography systems (DDR). The criteria for selection of the exposure parameters differ between the institutions with the majority (65%) following the recommendations from the manufacturers. The use of available tools for post-processing is limited being the most frequently reported tools used the contrast/ brightness and Zoom or Pan Magnification tools. Fifteen participant institutions (out of 19) have implemented a quality control programme. Conclusions – The technology for mammography in Portugal is heterogeneous and includes both obsolete and state of the art equipment. International guidelines (European or American) are not formally implemented and the manufacturer recommendations are the most frequently used guidance. Education and training needs were identified amongst the healthcare professionals (radiologists and radiographers) with focus in the areas of mammography intervention, patient dose optimization and quality control. The majority of the participants agree with the certification of mammography in Portugal.
Resumo:
Mestrado em Radiações Aplicadas às Tecnologias da Saúde.
Resumo:
Background: A common task in analyzing microarray data is to determine which genes are differentially expressed across two (or more) kind of tissue samples or samples submitted under experimental conditions. Several statistical methods have been proposed to accomplish this goal, generally based on measures of distance between classes. It is well known that biological samples are heterogeneous because of factors such as molecular subtypes or genetic background that are often unknown to the experimenter. For instance, in experiments which involve molecular classification of tumors it is important to identify significant subtypes of cancer. Bimodal or multimodal distributions often reflect the presence of subsamples mixtures. Consequently, there can be genes differentially expressed on sample subgroups which are missed if usual statistical approaches are used. In this paper we propose a new graphical tool which not only identifies genes with up and down regulations, but also genes with differential expression in different subclasses, that are usually missed if current statistical methods are used. This tool is based on two measures of distance between samples, namely the overlapping coefficient (OVL) between two densities and the area under the receiver operating characteristic (ROC) curve. The methodology proposed here was implemented in the open-source R software. Results: This method was applied to a publicly available dataset, as well as to a simulated dataset. We compared our results with the ones obtained using some of the standard methods for detecting differentially expressed genes, namely Welch t-statistic, fold change (FC), rank products (RP), average difference (AD), weighted average difference (WAD), moderated t-statistic (modT), intensity-based moderated t-statistic (ibmT), significance analysis of microarrays (samT) and area under the ROC curve (AUC). On both datasets all differentially expressed genes with bimodal or multimodal distributions were not selected by all standard selection procedures. We also compared our results with (i) area between ROC curve and rising area (ABCR) and (ii) the test for not proper ROC curves (TNRC). We found our methodology more comprehensive, because it detects both bimodal and multimodal distributions and different variances can be considered on both samples. Another advantage of our method is that we can analyze graphically the behavior of different kinds of differentially expressed genes. Conclusion: Our results indicate that the arrow plot represents a new flexible and useful tool for the analysis of gene expression profiles from microarrays.
Resumo:
A frequência de um Estágio de Ensino Especializado proporciona uma oportunidade de proximidade entre futuro docente e aluno. A observação de metodologias e o envolvimento com a música e contexto musical das escolas e alunos é algo fundamental para a formação correcta de docentes. Devido ao carácter prático do ensino de um instrumento, esta interacção demonstra-se fundamental para a experiência pedagógica. Neste estágio pretendeu-se aumentar o grau de qualidade de ensino e o número de ferramentas pedagógicas, comparando-as com ferramentas aprendidas e articuladas por experiência pré-adquirida como docente. A pedagogia de ensino de piano tem vindo a sofrer várias modificações e adaptações, como reflexo do meio e dos recursos disponíveis. A existência de conservatórios onde os alunos podem frequentar um grande número de disciplinas trouxe vantagens significativas para a sua cultura musical. Embora o estudo do instrumento continue a ser a prática principal, a diminuição nos horários de estudo obriga a uma selecção criteriosa dos conteúdos programáticos deste instrumento, de modo a não por em causa o desenvolvimento técnico dos alunos. Uma área raramente encontrada nestes conteúdos é a leitura à primeira vista. A investigação aqui apresentada pretendeu compreender como se realiza a leitura à primeira vista de uma partitura ao piano e como se pode desenvolver esta competência. Recorrendo a fontes bibliográficas consultadas e à análise de métodos de ensino representativos, foram alvo de investigação a forma como um leitor lê à primeira vista uma partitura, qual a importância do tacto e da visão, e até que ponto esta competência pode ser estudada e desenvolvida. Quais são os factores-chave da leitura à primeira vista? Até que ponto a prática ao instrumento pode influenciar esta capacidade? Embora a leitura à primeira vista tenha sido retirada como disciplina e forma de avaliação em vários estabelecimentos de ensino, a realização de alguns inquéritos pretendeu avaliar a importância dada a esta disciplina, entre professores e alunos.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.
Resumo:
In research on Silent Speech Interfaces (SSI), different sources of information (modalities) have been combined, aiming at obtaining better performance than the individual modalities. However, when combining these modalities, the dimensionality of the feature space rapidly increases, yielding the well-known "curse of dimensionality". As a consequence, in order to extract useful information from this data, one has to resort to feature selection (FS) techniques to lower the dimensionality of the learning space. In this paper, we assess the impact of FS techniques for silent speech data, in a dataset with 4 non-invasive and promising modalities, namely: video, depth, ultrasonic Doppler sensing, and surface electromyography. We consider two supervised (mutual information and Fisher's ratio) and two unsupervised (meanmedian and arithmetic mean geometric mean) FS filters. The evaluation was made by assessing the classification accuracy (word recognition error) of three well-known classifiers (knearest neighbors, support vector machines, and dynamic time warping). The key results of this study show that both unsupervised and supervised FS techniques improve on the classification accuracy on both individual and combined modalities. For instance, on the video component, we attain relative performance gains of 36.2% in error rates. FS is also useful as pre-processing for feature fusion. Copyright © 2014 ISCA.
Resumo:
In cluster analysis, it can be useful to interpret the partition built from the data in the light of external categorical variables which are not directly involved to cluster the data. An approach is proposed in the model-based clustering context to select a number of clusters which both fits the data well and takes advantage of the potential illustrative ability of the external variables. This approach makes use of the integrated joint likelihood of the data and the partitions at hand, namely the model-based partition and the partitions associated to the external variables. It is noteworthy that each mixture model is fitted by the maximum likelihood methodology to the data, excluding the external variables which are used to select a relevant mixture model only. Numerical experiments illustrate the promising behaviour of the derived criterion. © 2014 Springer-Verlag Berlin Heidelberg.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
We propose a blind method to detect interference in GNSS signals whereby the algorithms do not require knowledge of the interference or channel noise features. A sample covariance matrix is constructed from the received signal and its eigenvalues are computed. The generalized likelihood ratio test (GLRT) and the condition number test (CNT) are developed and compared in the detection of sinusoidal and chirp jamming signals. A computationally-efficient decision threshold was proposed for the CNT.