25 resultados para Relevant features
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
In data clustering, the problem of selecting the subset of most relevant features from the data has been an active research topic. Feature selection for clustering is a challenging task due to the absence of class labels for guiding the search for relevant features. Most methods proposed for this goal are focused on numerical data. In this work, we propose an approach for clustering and selecting categorical features simultaneously. We assume that the data originate from a finite mixture of multinomial distributions and implement an integrated expectation-maximization (EM) algorithm that estimates all the parameters of the model and selects the subset of relevant features simultaneously. The results obtained on synthetic data illustrate the performance of the proposed approach. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
Resumo:
Locomotor tasks characterization plays an important role in trying to improve the quality of life of a growing elderly population. This paper focuses on this matter by trying to characterize the locomotion of two population groups with different functional fitness levels (high or low) while executing three different tasks-gait, stair ascent and stair descent. Features were extracted from gait data, and feature selection methods were used in order to get the set of features that allow differentiation between functional fitness level. Unsupervised learning was used to validate the sets obtained and, ultimately, indicated that it is possible to distinguish the two population groups. The sets of best discriminate features for each task are identified and thoroughly analysed. Copyright © 2014 SCITEPRESS - Science and Technology Publications. All rights reserved.
Resumo:
Low noise surfaces have been increasingly considered as a viable and cost-effective alternative to acoustical barriers. However, road planners and administrators frequently lack information on the correlation between the type of road surface and the resulting noise emission profile. To address this problem, a method to identify and classify different types of road pavements was developed, whereby near field road noise is analyzed using statistical learning methods. The vehicle rolling sound signal near the tires and close to the road surface was acquired by two microphones in a special arrangement which implements the Close-Proximity method. A set of features, characterizing the properties of the road pavement, was extracted from the corresponding sound profiles. A feature selection method was used to automatically select those that are most relevant in predicting the type of pavement, while reducing the computational cost. A set of different types of road pavement segments were tested and the performance of the classifier was evaluated. Results of pavement classification performed during a road journey are presented on a map, together with geographical data. This procedure leads to a considerable improvement in the quality of road pavement noise data, thereby increasing the accuracy of road traffic noise prediction models.
Resumo:
It has been described that fullerenes (C60) present interesting properties with potential application in clinical conditions related to oxidative stress. One of the most prominent features of fullerenes is the ability to quench free radicals. However, because of its poor solubility, this has been studied mostly in organic solutions, while the antioxidant activity and cytotoxicity of fullerenes and their derivates in aqueous medium is not well characterized. The antioxidant capacity of synthesised C60-conjugates has been investigated and its was higher comparing to C60 isolated. The aim of this study was to assess the viability of C60-conjugates by determining its antioxidant activity and cytotoxicity in bio-relevant media.
Resumo:
In music genre classification, most approaches rely on statistical characteristics of low-level features computed on short audio frames. In these methods, it is implicitly considered that frames carry equally relevant information loads and that either individual frames, or distributions thereof, somehow capture the specificities of each genre. In this paper we study the representation space defined by short-term audio features with respect to class boundaries, and compare different processing techniques to partition this space. These partitions are evaluated in terms of accuracy on two genre classification tasks, with several types of classifiers. Experiments show that a randomized and unsupervised partition of the space, used in conjunction with a Markov Model classifier lead to accuracies comparable to the state of the art. We also show that unsupervised partitions of the space tend to create less hubs.
Resumo:
A criação da medida de política educativa (Despacho Normativo n.º 55/2008) - Territórios Educativos de Intervenção Prioritária (TEIP) corresponde à necessidade de concretização de um princípio da Lei de Bases do Sistema Educativo português, visando assegurar uma educação de base para todos, bem sucedida. Esta mesma medida acontece na sequência de uma outra anterior com as mesmas características, criada no ano lectivo de 1996/1997. A sua implementação no terreno foi objecto de um estudo encomendado pelo Instituto de Inovação Educacional no qual participei, integrando uma equipa de investigação da Faculdade de Psicologia e Ciências da Educação da Universidade de Lisboa. Nesta comunicação, proponho-me revisitar esse mesmo estudo, dando conta do balanço crítico então realizado, e, à luz das suas principais conclusões, refletir agora na qualidade de perito externo de um TEIP da região de Lisboa, sobre os modos como os TEIP2 evoluíram, em que sentido se deu tal evolução e quais os problemas que persistem. A consecução de um dos objectivos centrais desta medida de política educativa, especificamente a melhoria da qualidade das aprendizagens traduzida no sucesso educativo dos alunos, parece, de acordo com os relatórios oficiais, não ser ainda muito expressiva nem muito consistente. Torna-se, portanto, necessário interrogar e problematizar as práticas de ensino realizadas pelos professores, repensando estratégias pedagógicas que se configurem relevantes para dar expressão real àquele objectivo.
Resumo:
A great number of low-temperature geothermal fields occur in Northern-Portugal related to fractured rocks. The most important superficial manifestations of these hydrothermal systems appear in pull-apart tectonic basins and are strongly conditioned by the orientation of the main fault systems in the region. This work presents the interpretation of gravity gradient maps and 3D inversion model produced from a regional gravity survey. The horizontal gradients reveal a complex fault system. The obtained 3D model of density contrast puts into evidence the main fault zone in the region and the depth distribution of the granitic bodies. Their relationship with the hydrothermal systems supports the conceptual models elaborated from hydrochemical and isotopic water analyses. This work emphasizes the importance of the role of the gravity method and analysis to better understand the connection between hydrothermal systems and the fractured rock pattern and surrounding geology. (c) 2013 Elsevier B.V. All rights reserved.
Resumo:
Objective - To describe and validate the simulation of the basic features of GE Millennium MG gamma camera using the GATE Monte Carlo platform. Material and methods - Crystal size and thickness, parallel-hole collimation and a realistic energy acquisition window were simulated in the GATE platform. GATE results were compared to experimental data in the following imaging conditions: a point source of 99mTc at different positions during static imaging and tomographic acquisitions using two different energy windows. The accuracy between the events expected and detected by simulation was obtained with the Mann–Whitney–Wilcoxon test. Comparisons were made regarding the measurement of sensitivity and spatial resolution, static and tomographic. Simulated and experimental spatial resolutions for tomographic data were compared with the Kruskal–Wallis test to assess simulation accuracy for this parameter. Results - There was good agreement between simulated and experimental data. The number of decays expected when compared with the number of decays registered, showed small deviation (≤0.007%). The sensitivity comparisons between static acquisitions for different distances from source to collimator (1, 5, 10, 20, 30cm) with energy windows of 126–154 keV and 130–158 keV showed differences of 4.4%, 5.5%, 4.2%, 5.5%, 4.5% and 5.4%, 6.3%, 6.3%, 5.8%, 5.3%, respectively. For the tomographic acquisitions, the mean differences were 7.5% and 9.8% for the energy window 126–154 keV and 130–158 keV. Comparison of simulated and experimental spatial resolutions for tomographic data showed no statistically significant differences with 95% confidence interval. Conclusions - Adequate simulation of the system basic features using GATE Monte Carlo simulation platform was achieved and validated.
Resumo:
Steatosis, also known as fatty liver, corresponds to an abnormal retention of lipids within the hepatic cells and reflects an impairment of the normal processes of synthesis and elimination of fat. Several causes may lead to this condition, namely obesity, diabetes, or alcoholism. In this paper an automatic classification algorithm is proposed for the diagnosis of the liver steatosis from ultrasound images. The features are selected in order to catch the same characteristics used by the physicians in the diagnosis of the disease based on visual inspection of the ultrasound images. The algorithm, designed in a Bayesian framework, computes two images: i) a despeckled one, containing the anatomic and echogenic information of the liver, and ii) an image containing only the speckle used to compute the textural features. These images are computed from the estimated RF signal generated by the ultrasound probe where the dynamic range compression performed by the equipment is taken into account. A Bayes classifier, trained with data manually classified by expert clinicians and used as ground truth, reaches an overall accuracy of 95% and a 100% of sensitivity. The main novelties of the method are the estimations of the RF and speckle images which make it possible to accurately compute textural features of the liver parenchyma relevant for the diagnosis.
Resumo:
Water covers over 70% of the Earth's surface, and is vital for all known forms of life. But only 3% of the Earth's water is fresh water, and less than 0.3% of all freshwater is in rivers, lakes, reservoirs and the atmosphere. However, rivers and lakes are an important part of fresh surface water, amounting to about 89%. In this Master Thesis dissertation, the focus is on three types of water bodies – rivers, lakes and reservoirs, and their water quality issues in Asian countries. The surface water quality in a region is largely determined both by the natural processes such as climate or geographic conditions, and the anthropogenic influences such as industrial and agricultural activities or land use conversion. The quality of the water can be affected by pollutants discharge from a specific point through a sewer pipe and also by extensive drainage from agriculture/urban areas and within basin. Hence, water pollutant sources can be divided into two categories: Point source pollution and Non-point source (NPS) pollution. Seasonal variations in precipitation and surface run-off have a strong effect on river discharge and the concentration of pollutants in water bodies. For example, in the rainy season, heavy and persistent rain wash off the ground, the runoff flow increases and may contain various kinds of pollutants and, eventually, enters the water bodies. In some cases, especially in confined water bodies, the quality may be positive related with rainfall in the wet season, because this confined type of fresh water systems allows high dilution of pollutants, decreasing their possible impacts. During the dry season, the quality of water is largely related to industrialization and urbanization pollution. The aim of this study is to identify the most common water quality problems in Asian countries and to enumerate and analyze the methodologies used for assessment of water quality conditions of both rivers and confined water bodies (lakes and reservoirs). Based on the evaluation of a sample of 57 papers, dated between 2000 and 2012, it was found that over the past decade, the water quality of rivers, lakes, and reservoirs in developing countries is being degraded. Water pollution and destruction of aquatic ecosystems have caused massive damage to the functions and integrity of water resources. The most widespread NPS in Asian countries and those which have the greatest spatial impacts are urban runoff and agriculture. Locally, mine waste runoff and rice paddy are serious NPS problems. The most relevant point pollution sources are the effluents from factories, sewage treatment plant, and public or household facilities. It was found that the most used methodology was unquestionably the monitoring activity, used in 49 of analyzed studies, accounting for 86%. Sometimes, data from historical databases were used as well. It can be seen that taking samples from the water body and then carry on laboratory work (chemical analyses) is important because it can give an understanding of the water quality. 6 papers (11%) used a method that combined monitoring data and modeling. 6 papers (11%) just applied a model to estimate the quality of water. Modeling is a useful resource when there is limited budget since some models are of free download and use. In particular, several of used models come from the U.S.A, but they have their own purposes and features, meaning that a careful application of the models to other countries and a critical discussion of the results are crucial. 5 papers (9%) focus on a method combining monitoring data and statistical analysis. When there is a huge data matrix, the researchers need an efficient way of interpretation of the information which is provided by statistics. 3 papers (5%) used a method combining monitoring data, statistical analysis and modeling. These different methods are all valuable to evaluate the water quality. It was also found that the evaluation of water quality was made as well by using other types of sampling different than water itself, and they also provide useful information to understand the condition of the water body. These additional monitoring activities are: Air sampling, sediment sampling, phytoplankton sampling and aquatic animal tissues sampling. Despite considerable progress in developing and applying control regulations to point and NPS pollution, the pollution status of rivers, lakes, and reservoirs in Asian countries is not improving. In fact, this reflects the slow pace of investment in new infrastructure for pollution control and growing population pressures. Water laws or regulations and public involvement in enforcement can play a constructive and indispensable role in environmental protection. In the near future, in order to protect water from further contamination, rapid action is highly needed to control the various kinds of effluents in one region. Environmental remediation and treatment of industrial effluent and municipal wastewaters is essential. It is also important to prevent the direct input of agricultural and mine site runoff. Finally, stricter environmental regulation for water quality is required to support protection and management strategies. It would have been possible to get further information based in the 57 sample of papers. For instance, it would have been interesting to compare the level of concentrations of some pollutants in the diferente Asian countries. However the limit of three months duration for this study prevented further work to take place. In spite of this, the study objectives were achieved: the work provided an overview of the most relevant water quality problems in rivers, lakes and reservoirs in Asian countries, and also listed and analyzed the most common methodologies.
Resumo:
In the literature, concepts of “polyneuropathy”, “peripheral neuropathy” and “neuropathy” are often mistakenly used as synonyms. Polyneuropathy is a specific term that refers to a relatively homogenous process that affects multiple peripheral nerves. Most of these tend to present as symmetric polyneuropathies that first manifest in the distal portions of the affected nerves. Many of these distal symmetric polyneuropathies are due to toxic-metabolic causes such as alcohol abuse and diabetes mellitus. Other distal symmetric polyneuropathies may result from an overproduction of substances that result in nerve pathology such as is observed in anti-MAG neuropathy and monoclonal gammopathy of undetermined significance. Other “overproduction” disorders are hereditary such as noted in the Portuguese type of familial amyloid polyneuropathy (FAP). FAP is a manifestation of a group of hereditary amyloidoses; an autosomal dominant, multisystemic disorder wherein the mutant amyloid precursor, transthyretin, is produced in excess primarily by the liver. The liver accounts for approximately 98% of all transthyretin production. FAP is confirmed by detecting a transthyretin variant with a methionine for valine substitution at position 30 [TTR (Met30)]. Familial Amyloidotic Polyneuropathy (FAP) – Portuguese type was first described by a Portuguese neurologist, Corino de Andrade in 1939 and published in 1951. Most persons with this disorder are descended from Portuguese sailors who sired offspring in various locations, primarily in Sweden, Japan and Mallorca. Their descendants emigrated worldwide such that this disorder has been reported in other countries as well. More than 2000 symptomatic cases have been reported in Portugal. FAP progresses rapidly with an average time course from symptom onset to multi-organ involvement and death between ten and twenty years. Treatments directed at removing this aberrant protein such as plasmapheresis and immunoadsorption proved to be unsuccessful. Liver transplantation has been the only effective solution as evidenced by almost 2000 liver transplants performed worldwide. A therapy for FAP with a novel agent, “Tafamidis” has shown some promise in ongoing phase III clinical trials. It is well recognized that regular physical activity of moderate intensity has a positive effect on physical fitness as gauged by body composition, aerobic capacity, muscular strength and endurance and flexibility. Physical fitness has been reported to result in the reduction of symptoms and lesser impairment when performing activities of daily living. Exercise has been advocated as part of a comprehensive approach to the treatment of chronic diseases. Therefore, this chapter concludes with a discussion of the role of exercise training on FAP.
Resumo:
Dissertação apresentada à Escola Superior de Educação de Lisboa para a obtenção do Grau de Mestre em Ciências da Educação - especialidade Supervisão em Educação
Resumo:
Mestrado em Auditoria