47 resultados para Selection methods
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
A organização automática de mensagens de correio electrónico é um desafio actual na área da aprendizagem automática. O número excessivo de mensagens afecta cada vez mais utilizadores, especialmente os que usam o correio electrónico como ferramenta de comunicação e trabalho. Esta tese aborda o problema da organização automática de mensagens de correio electrónico propondo uma solução que tem como objectivo a etiquetagem automática de mensagens. A etiquetagem automática é feita com recurso às pastas de correio electrónico anteriormente criadas pelos utilizadores, tratando-as como etiquetas, e à sugestão de múltiplas etiquetas para cada mensagem (top-N). São estudadas várias técnicas de aprendizagem e os vários campos que compõe uma mensagem de correio electrónico são analisados de forma a determinar a sua adequação como elementos de classificação. O foco deste trabalho recai sobre os campos textuais (o assunto e o corpo das mensagens), estudando-se diferentes formas de representação, selecção de características e algoritmos de classificação. É ainda efectuada a avaliação dos campos de participantes através de algoritmos de classificação que os representam usando o modelo vectorial ou como um grafo. Os vários campos são combinados para classificação utilizando a técnica de combinação de classificadores Votação por Maioria. Os testes são efectuados com um subconjunto de mensagens de correio electrónico da Enron e um conjunto de dados privados disponibilizados pelo Institute for Systems and Technologies of Information, Control and Communication (INSTICC). Estes conjuntos são analisados de forma a perceber as características dos dados. A avaliação do sistema é realizada através da percentagem de acerto dos classificadores. Os resultados obtidos apresentam melhorias significativas em comparação com os trabalhos relacionados.
Resumo:
Locomotor tasks characterization plays an important role in trying to improve the quality of life of a growing elderly population. This paper focuses on this matter by trying to characterize the locomotion of two population groups with different functional fitness levels (high or low) while executing three different tasks-gait, stair ascent and stair descent. Features were extracted from gait data, and feature selection methods were used in order to get the set of features that allow differentiation between functional fitness level. Unsupervised learning was used to validate the sets obtained and, ultimately, indicated that it is possible to distinguish the two population groups. The sets of best discriminate features for each task are identified and thoroughly analysed. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Electrocardiography (ECG) biometrics is emerging as a viable biometric trait. Recent developments at the sensor level have shown the feasibility of performing signal acquisition at the fingers and hand palms, using one-lead sensor technology and dry electrodes. These new locations lead to ECG signals with lower signal to noise ratio and more prone to noise artifacts; the heart rate variability is another of the major challenges of this biometric trait. In this paper we propose a novel approach to ECG biometrics, with the purpose of reducing the computational complexity and increasing the robustness of the recognition process enabling the fusion of information across sessions. Our approach is based on clustering, grouping individual heartbeats based on their morphology. We study several methods to perform automatic template selection and account for variations observed in a person's biometric data. This approach allows the identification of different template groupings, taking into account the heart rate variability, and the removal of outliers due to noise artifacts. Experimental evaluation on real world data demonstrates the advantages of our approach.
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Resumo:
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity Infra mode selection algorithm is proposed to select the most 'critical' blocks in the WZ frame and help the decoder with some reliable data for those blocks. For each block, the novel coding mode selection algorithm estimates the encoding rate for the Intra based and WZ coding modes and determines the best coding mode while maintaining a low encoder complexity. The proposed solution is evaluated in terms of rate-distortion performance with improvements up to 1.2 dB regarding a WZ coding mode only solution.
Resumo:
In this work, 14 primary schools of Lisbon city, Portugal, followed a questionnaire of the ISAAC - International Study of Asthma and Allergies in Childhood Program, in 2009/2010. The questionnaire contained questions to identify children with respiratory diseases (wheeze, asthma and rhinitis). Total particulate matter (TPM) was passively collected inside two classrooms of each of 14 primary schools. Two types of filter matrices were used to collect TPM: Millipore (IsoporeTM) polycarbonate and quartz. Three campaigns were selected for the measurement of TPM: Spring, Autumn and Winter. The highest difference between the two types of filters is that the mass of collected particles was higher in quartz filters than in polycarbonate filters, even if their correlation is excellent. The highest TPM depositions occurred between October 2009 and March 2010, when related with rhinitis proportion. Rhinitis was found to be related to TPM when the data were grouped seasonally and averaged for all the schools. For the data of 2006/2007, the seasonal variation was found to be related to outdoor particle deposition (below 10 μm).
Resumo:
Reclaimed water from small wastewater treatment facilities in the rural areas of the Beira Interior region (Portugal) may constitute an alternative water source for aquifer recharge. A 21-month monitoring period in a constructed wetland treatment system has shown that 21,500 m(3) year(-1) of treated wastewater (reclaimed water) could be used for aquifer recharge. A GIS-based multi-criteria analysis was performed, combining ten thematic maps and economic, environmental and technical criteria, in order to produce a suitability map for the location of sites for reclaimed water infiltration. The areas chosen for aquifer recharge with infiltration basins are mainly composed of anthrosol with more than 1 m deep and fine sand texture, which allows an average infiltration velocity of up to 1 m d(-1). These characteristics will provide a final polishing treatment of the reclaimed water after infiltration (soil aquifer treatment (SAT)), suitable for the removal of the residual load (trace organics, nutrients, heavy metals and pathogens). The risk of groundwater contamination is low since the water table in the anthrosol areas ranges from 10 m to 50 m. Oil the other hand, these depths allow a guaranteed unsaturated area suitable for SAT. An area of 13,944 ha was selected for study, but only 1607 ha are suitable for reclaimed water infiltration. Approximately 1280 m(2) were considered enough to set up 4 infiltration basins to work in flooding and drying cycles.
Resumo:
Chromium dioxide (CrO2) has been extensively used in the magnetic recording industry. However, it is its ferromagnetic half-metallic nature that has more recently attracted much attention, primarily for the development of spintronic devices. CrO2 is the only stoichiometric binary oxide theoretically predicted to be fully spin polarized at the Fermi level. It presents a Curie temperature of ∼ 396 K, i.e. well above room temperature, and a magnetic moment of 2 mB per formula unit. However an antiferromagnetic native insulating layer of Cr2O3 is always present on the CrO2 surface which enhances the CrO2 magnetoresistance and might be used as a barrier in magnetic tunnel junctions.
Resumo:
Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.
Resumo:
O papel crucial da escola na sociedade e o exercício da atividade profissional como docente, com um olhar atento sobre o traçar das políticas educativas, motivou a elaboração deste trabalho de investigação, que tem como objeto de estudo os papéis desempenhados pelos diretores das escolas estatais e não estatais e como objetivos específicos estudar o impacto da legislação emanada pela tutela, nas escolas públicas e privadas e analisar as convergências e divergências nas conceções e práticas dos seus diretores. As dimensões analíticas exploradas no estudo abrangem as conceções gestionárias dos diretores quanto aos modelos de gestão, às práticas de autonomia, ao serviço educativo e à prestação de contas. Este trabalho de natureza qualitativa foca o olhar sobre um grupo restrito de atores educativos que foram escolhidos devido ao papel que desempenham na organização educativa e porque a publicação do Decreto- Lei 75/ 2008 de 22 de abril, trouxe alterações à escola pública. A tradição de direção colegial que vigorava nas organizações educativas estatais foi quebrada. O presidente do conselho diretivo é doravante substituído pelo diretor que passa a delegar competências, a designar equipas e a prestar contas à tutela e comunidade educativa à semelhança do diretor da escola privada. O estudo de caso apresentado foi realizado em três escolas públicas e em três colégios privados com recurso a entrevistas semiestruturadas e à análise documental. As conclusões deste trabalho remetem para a existência de muitos pontos de convergência entre a opinião dos diretores da escola pública e privada. As temáticas relativas à autonomia, escolha do pessoal docente e prestação de contas, são olhadas pela mesma perspetiva. A autonomia é vista como “uma miragem”; uma “terra prometida” (Lima e Afonso, 1995). A prestação de contas é exigida aos diretores do ensino estatal e do privado através de instrumentos próximos. As principais divergências situam-se ao nível do menor interesse demonstrado, por parte da direção da escola privada, pela oferta de cursos profissionais e pelo menor investimento em estratégias para a prevenção do abandono escolar, que é considerado pouco significativo na escola não estatal. A defesa da escolha de escola e da modalidade de cheque ensino são outros dos pontos que marcam a divergência entre estes diretores. Abstract: This investigative paper - whose objective is the study of the role of the school directors, both State and non-state, and the impact of legislation on both State and private schools, as well as the analysis of the convergent and divergent conceptions and practices of these directors – is motivated by the crucial role played by schools in our society and by the professional activity of the teacher, with an attentive look at the educational practices. The analytical dimension explored in this study includes the various concepts of management of the school director as models of management, as well as practices in self-sufficiency, budget control and educational service to the community. This study has a qualitative nature and focuses on a small group of individuals who were chosen for the role they play in the whole educational structure, considering that the Decree nº 75/2008, published on April the 22nd, determined alterations to the public school system. The traditional method of control of the public school system has, henceforth, been changed. The headmaster is now substituted by a director who delegates his functions, makes up work teams and elaborates the school budget which is presented to the respective governmental ministry and the community, much like as what happens in private schools. The present study encompasses three public schools and three private schools, the methods of study being semi-structured interviews as well as the consultation of documentation. The conclusions point to many convergent opinions of the school directors of both the public and the private sector. The school directors of both public and private schools used in this study share the same opinion as to the factors involved in the selection of teachers, the elaboration of the school budget and the implementation of self-sufficiency policies. These self-sufficiency policies are seen as a “mirage” or a “promised land” (Lima and Afonso, 1995). The school budget and its management practices are implemented in both public and private schools through similar instruments. The principal differences are noted on smaller, less interesting points, on the part of the direction of the private schools, and result from the elaboration of professional courses and minor investment in the strategies, oriented to the prevention of school drop-outs, which is considered of little significance in the private school sector. The other factors of divergence result from the right to choose the type of school desired and the type of teaching implemented.
Resumo:
Introdução – Numa era em que os tratamentos de Radioterapia Externa (RTE) exigem cada vez mais precisão, a utilização de imagem médica permitirá medir, quantificar e avaliar o impacto do erro provocado pela execução do tratamento ou pelos movimentos dos órgãos. Objetivo – Analisar os dados existentes na literatura acerca de desvios de posicionamento (DP) em patologias de cabeça e pescoço (CP) e próstata, medidos com Cone Beam Computed Tomography (CBCT) ou Electronic Portal Image Device (EPID). Metodologia – Para esta revisão da literatura foram pesquisados artigos recorrendo às bases de dados MEDLINE/PubMed e b-on. Foram incluídos artigos que reportassem DP em patologias CP e próstata medidos através de CBCT e EPID. Seguidamente foram aplicados critérios de validação, que permitiram a seleção dos estudos. Resultados – Após a análise de 35 artigos foram incluídos 13 estudos e validados 9 estudos. Para tumores CP, a média (μ) dos DP encontra-se entre 0,0 e 1,2mm, com um desvio padrão (σ) máximo de 1,3mm. Para patologias de próstata observa-se μDP compreendido entre 0,0 e 7,1mm, com σ máximo de 7,5mm. Discussão/Conclusão – Os DP em patologias CP são atribuídos, maioritariamente, aos efeitos secundários da RTE, como mucosite e dor, que afetam a deglutição e conduzem ao emagrecimento, contribuindo para a instabilidade da posição do doente durante o tratamento, aumentando as incertezas de posicionamento. Os movimentos da próstata devem-se principalmente às variações de preenchimento vesical, retal e gás intestinal. O desconhecimento dos DP afeta negativamente a precisão da RTE. É importante detetá-los e quantificá-los para calcular margens adequadas e a magnitude dos erros, aumentando a precisão da administração de RTE, incluindo o aumento da segurança do doente. - ABSTRACT - Background and Purpose – In an era where precision is an increasing necessity in external radiotherapy (RT), modern medical imaging techniques provide means for measuring, quantifying and evaluating the impact of treatment execution and movement error. The aim of this paper is to review the current literature on the quantification of setup deviations (SD) in patients with head and neck (H&N) or prostate tumors, using Cone Beam Computed Tomography (CBCT) or Electronic Portal Image Device (EPID). Methods – According to the study protocol, MEDLINE/PubMed and b-on databases were searched for trials, which were analyzed using selection criteria based on the quality of the articles. Results – After assessment of 35 papers, 13 studies were included in this analysis and nine were authenticated (6 for prostate and 3 for H&N tumors). The SD in the treatment of H&N cancer patients is in the interval of 0.1 to 1.2mm, whereas in prostate cancer this interval is 0.0 to 7.1mm. Discussion – The reproducibility of patient positioning is the biggest barrier for higher precision in RT, which is affected by geometrical uncertainty, positioning errors and inter or intra-fraction organ movement. There are random and systematic errors associated to patient positioning, introduced since the treatment planning phase or through physiological organ movement. Conclusion – The H&N SD are mostly assigned to the Radiotherapy adverse effects, like mucositis and pain, which affect swallowing and decrease secretions, contributing for the instability of patient positioning during RT treatment and increasing positioning uncertainties. Prostate motion is mainly related to the variation in bladder and rectal filling. Ignoring SD affects negatively the accuracy of RT. Therefore, detection and quantification of SD is crucial in order to calculate appropriate margins, the magnitude of error and to improve accuracy in RTE and patient safety.
Resumo:
Introdução – O incremento do tempo de exposição à microgravidade origina um descondicionamento músculo-esquelético que precisa de ser prevenido através do treino. Objetivos – Identificar os padrões destas alterações e descrever os programas de treino em microgravidade e estratégias pós-exposição. Método – A pesquisa da revisão da literatura foi conduzida através da MEDLINE/PubMed e PEDro com as seguintes palavras--chave: “spaceflight rehabilitation”, “spaceflight muscle”, “microgravity muscle” e “bed rest muscle”, seguida de uma seleção dos artigos. Resultados – Os estudos encontrados apresentam uma resposta músculo-tendinosa diferencial sendo que o treino protege total ou parcialmente estas estruturas. Conclusão – O treino de resistance de intensidade elevada e baixas repetições associado a exercícios específicos é o mais adequado para responder ao descondicionamento. - ABSTRACT - Introduction – The increased microgravity exposition time raised the need for training programs to avoid muscle and tendinous deconditioning. Objectives – To identify the deconditioning patterns and to identify and describe the training programs used for its prevention during and after microgravity exposure. Methods – This literature review is based on a search conducted via MEDLINE/PubMed and PEDro using the following search words: “spaceflight rehabilitation”, “spaceflight muscle”, “microgravity muscle” and “bed rest muscle”. The search was followed by an article selection. Results – The studies reveal a differential exposure phenomenon for which the training programs reviewed are partly effective. Conclusion – According to the literature the high intensity low volume resistance programs with specific exercises are more appropriate to address the deconditioning problem.