905 resultados para Feature Quantization
Resumo:
In this study, feature selection in classification based problems is highlighted. The role of feature selection methods is to select important features by discarding redundant and irrelevant features in the data set, we investigated this case by using fuzzy entropy measures. We developed fuzzy entropy based feature selection method using Yu's similarity and test this using similarity classifier. As the similarity classifier we used Yu's similarity, we tested our similarity on the real world data set which is dermatological data set. By performing feature selection based on fuzzy entropy measures before classification on our data set the empirical results were very promising, the highest classification accuracy of 98.83% was achieved when testing our similarity measure to the data set. The achieved results were then compared with some other results previously obtained using different similarity classifiers, the obtained results show better accuracy than the one achieved before. The used methods helped to reduce the dimensionality of the used data set, to speed up the computation time of a learning algorithm and therefore have simplified the classification task
Resumo:
Green IT is a term that covers various tasks and concepts that are related to reducing the environmental impact of IT. At enterprise level, Green IT has significant potential to generate sustainable cost savings: the total amount of devices is growing and electricity prices are rising. The lifecycle of a computer can be made more environmentally sustainable using Green IT, e.g. by using energy efficient components and by implementing device power management. The challenge using power management at enterprise level is how to measure and follow-up the impact of power management policies? During the thesis a power management feature was developed to a configuration management system. The feature can be used to automatically power down and power on PCs using a pre-defined schedule and to estimate the total power usage of devices. Measurements indicate that using the feature the device power consumption can be monitored quite precisely and the power consumption can be reduced, which generates electricity cost savings and reduces the environmental impact of IT.
Resumo:
Developing software is a difficult and error-prone activity. Furthermore, the complexity of modern computer applications is significant. Hence,an organised approach to software construction is crucial. Stepwise Feature Introduction – created by R.-J. Back – is a development paradigm, in which software is constructed by adding functionality in small increments. The resulting code has an organised, layered structure and can be easily reused. Moreover, the interaction with the users of the software and the correctness concerns are essential elements of the development process, contributing to high quality and functionality of the final product. The paradigm of Stepwise Feature Introduction has been successfully applied in an academic environment, to a number of small-scale developments. The thesis examines the paradigm and its suitability to construction of large and complex software systems by focusing on the development of two software systems of significant complexity. Throughout the thesis we propose a number of improvements and modifications that should be applied to the paradigm when developing or reengineering large and complex software systems. The discussion in the thesis covers various aspects of software development that relate to Stepwise Feature Introduction. More specifically, we evaluate the paradigm based on the common practices of object-oriented programming and design and agile development methodologies. We also outline the strategy to testing systems built with the paradigm of Stepwise Feature Introduction.
Resumo:
In this paper a computer program to model and support product design is presented. The product is represented through a hierarchical structure that allows the user to navigate across the products components, and it aims at facilitating each step of the detail design process. A graphical interface was also developed, which shows visually to the user the contents of the product structure. Features are used as building blocks for the parts that compose the product, and object-oriented methodology was used as a means to implement the product structure. Finally, an expert system was also implemented, whose knowledge base rules help the user design a product that meets design and manufacturing requirements.
Resumo:
Adrenocortical tumors (ACT) in children under 15 years of age exhibit some clinical and biological features distinct from ACT in adults. Cell proliferation, hypertrophy and cell death in adrenal cortex during the last months of gestation and the immediate postnatal period seem to be critical for the origin of ACT in children. Studies with large numbers of patients with childhood ACT have indicated a median age at diagnosis of about 4 years. In our institution, the median age was 3 years and 5 months, while the median age for first signs and symptoms was 2 years and 5 months (N = 72). Using the comparative genomic hybridization technique, we have reported a high frequency of 9q34 amplification in adenomas and carcinomas. This finding has been confirmed more recently by investigators in England. The lower socioeconomic status, the distinctive ethnic groups and all the regional differences in Southern Brazil in relation to patients in England indicate that these differences are not important to determine 9q34 amplification. Candidate amplified genes mapped to this locus are currently being investigated and Southern blot results obtained so far have discarded amplification of the abl oncogene. Amplification of 9q34 has not been found to be related to tumor size, staging, or malignant histopathological features, nor does it seem to be responsible for the higher incidence of ACT observed in Southern Brazil, but could be related to an ACT from embryonic origin.
Resumo:
In a serial feature-positive conditional discrimination procedure the properties of a target stimulus A are defined by the presence or not of a feature stimulus X preceding it. In the present experiment, composite features preceded targets associated with two different topography operant responses (right and left bar pressing); matching and non-matching-to-sample arrangements were also used. Five water-deprived Wistar rats were trained in 6 different trials: X-R®Ar and X-L®Al, in which X and A were same modality visual stimuli and the reinforcement was contingent to pressing either the right (r) or left (l) bar that had the light on during the feature (matching-to-sample); Y-R®Bl and Y-L®Br, in which Y and B were same modality auditory stimuli and the reinforcement was contingent to pressing the bar that had the light off during the feature (non-matching-to-sample); A- and B- alone. After 100 training sessions, the animals were submitted to transfer tests with the targets used plus a new one (auditory click). Average percentages of stimuli with a response were measured. Acquisition occurred completely only for Y-L®Br+; however, complex associations were established along training. Transfer was not complete during the tests since concurrent effects of extinction and response generalization also occurred. Results suggest the use of both simple conditioning and configurational strategies, favoring the most recent theories of conditional discrimination learning. The implications of the use of complex arrangements for discussing these theories are considered.
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
Object detection is a fundamental task of computer vision that is utilized as a core part in a number of industrial and scientific applications, for example, in robotics, where objects need to be correctly detected and localized prior to being grasped and manipulated. Existing object detectors vary in (i) the amount of supervision they need for training, (ii) the type of a learning method adopted (generative or discriminative) and (iii) the amount of spatial information used in the object model (model-free, using no spatial information in the object model, or model-based, with the explicit spatial model of an object). Although some existing methods report good performance in the detection of certain objects, the results tend to be application specific and no universal method has been found that clearly outperforms all others in all areas. This work proposes a novel generative part-based object detector. The generative learning procedure of the developed method allows learning from positive examples only. The detector is based on finding semantically meaningful parts of the object (i.e. a part detector) that can provide additional information to object location, for example, pose. The object class model, i.e. the appearance of the object parts and their spatial variance, constellation, is explicitly modelled in a fully probabilistic manner. The appearance is based on bio-inspired complex-valued Gabor features that are transformed to part probabilities by an unsupervised Gaussian Mixture Model (GMM). The proposed novel randomized GMM enables learning from only a few training examples. The probabilistic spatial model of the part configurations is constructed with a mixture of 2D Gaussians. The appearance of the parts of the object is learned in an object canonical space that removes geometric variations from the part appearance model. Robustness to pose variations is achieved by object pose quantization, which is more efficient than previously used scale and orientation shifts in the Gabor feature space. Performance of the resulting generative object detector is characterized by high recall with low precision, i.e. the generative detector produces large number of false positive detections. Thus a discriminative classifier is used to prune false positive candidate detections produced by the generative detector improving its precision while keeping high recall. Using only a small number of positive examples, the developed object detector performs comparably to state-of-the-art discriminative methods.
Resumo:
Tässä työssä testattiin partikkelikokojakaumien analysoinnissa käytettävää kuvankäsittelyohjelmaa INCA Feature. Partikkelikokojakaumat määritettiin elektronimikroskooppikuvista INCA Feature ohjelmaa käyttäen partikkeleiden projektiokuvista päällystyspigmenttinä käytettävälle talkille ja kahdelle eri karbonaattilaadulle. Lisäksi määritettiin partikkelikokojakaumat suodatuksessa ja puhdistuksessa apuaineina käytettäville piidioksidi- ja alumiinioksidihiukkasille. Kuvankäsittelyohjelmalla määritettyjä partikkelikokojakaumia verrattiin partikkelin laskeutumisnopeuteen eli sedimentaatioon perustuvalla SediGraph 5100 analysaattorilla ja laserdiffraktioon perustuvalla Coulter LS 230 menetelmällä analysoituihin partikkelikokojakaumiin. SediGraph 5100 ja kuva-analyysiohjelma antoivat talkkipartikkelien kokojakaumalle hyvin samankaltaisen keskiarvon. Sen sijaan Coulter LS 230 laitteen antama kokojakauman keskiarvo poikkesi edellisistä. Kaikki vertailussa olleet partikkelikokojakaumamenetelmät asettivat eri näytteiden partikkelit samaan kokojärjestykseen. Kuitenkaan menetelmien tuloksia ei voida numeerisesti verrata toisiinsa, sillä kaikissa käytetyissä analyysimenetelmissä partikkelikoon mittaus perustuu partikkelin eri ominaisuuteen. Työn perusteella kaikki testatut analyysimenetelmät soveltuvat paperipigmenttien partikkelikokojakaumien määrittämiseen. Tässä työssä selvitettiin myös kuva-analyysiin tarvittava partikkelien lukumäärä, jolla analyysitulos on luotettava. Työssä todettiin, että analysoitavien partikkelien lukumäärän tulee olla vähintään 300 partikkelia. Liian suuri näytemäärä lisää kokojakauman hajontaa ja pidentää analyysiin käytettyä aikaa useaan tuntiin. Näytteenkäsittely vaatii vielä lisää tutkimuksia, sillä se on tärkein ja kriittisin vaihe SEM ja kuva-analyysiohjelmalla tehtävää partikkelikokoanalyysiä. Automaattisten mikroskooppien yleistyminen helpottaa ja nopeuttaa analyysien tekoa, jolloin menetelmän suosio tulee kasvamaan myös paperipigmenttien tutkimuksessa. Laitteiden korkea hinta ja käyttäjältä vaadittava eritysosaaminen tulevat rajaamaan käytön ainakin toistaiseksi tutkimuslaitoksiin.
Resumo:
A feature-based fitness function is applied in a genetic programming system to synthesize stochastic gene regulatory network models whose behaviour is defined by a time course of protein expression levels. Typically, when targeting time series data, the fitness function is based on a sum-of-errors involving the values of the fluctuating signal. While this approach is successful in many instances, its performance can deteriorate in the presence of noise. This thesis explores a fitness measure determined from a set of statistical features characterizing the time series' sequence of values, rather than the actual values themselves. Through a series of experiments involving symbolic regression with added noise and gene regulatory network models based on the stochastic 'if-calculus, it is shown to successfully target oscillating and non-oscillating signals. This practical and versatile fitness function offers an alternate approach, worthy of consideration for use in algorithms that evaluate noisy or stochastic behaviour.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
New Feature at Niagara – Clark Hill Islands (5 islands situated in the rapids of the Niagara River). These islands are currently known as Dufferin Islands, 22 ½ cm. x 15 ½ cm, n.d.
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
Magnetic Resonance Imaging (MRI) is a multi sequence medical imaging technique in which stacks of images are acquired with different tissue contrasts. Simultaneous observation and quantitative analysis of normal brain tissues and small abnormalities from these large numbers of different sequences is a great challenge in clinical applications. Multispectral MRI analysis can simplify the job considerably by combining unlimited number of available co-registered sequences in a single suite. However, poor performance of the multispectral system with conventional image classification and segmentation methods makes it inappropriate for clinical analysis. Recent works in multispectral brain MRI analysis attempted to resolve this issue by improved feature extraction approaches, such as transform based methods, fuzzy approaches, algebraic techniques and so forth. Transform based feature extraction methods like Independent Component Analysis (ICA) and its extensions have been effectively used in recent studies to improve the performance of multispectral brain MRI analysis. However, these global transforms were found to be inefficient and inconsistent in identifying less frequently occurred features like small lesions, from large amount of MR data. The present thesis focuses on the improvement in ICA based feature extraction techniques to enhance the performance of multispectral brain MRI analysis. Methods using spectral clustering and wavelet transforms are proposed to resolve the inefficiency of ICA in identifying small abnormalities, and problems due to ICA over-completeness. Effectiveness of the new methods in brain tissue classification and segmentation is confirmed by a detailed quantitative and qualitative analysis with synthetic and clinical, normal and abnormal, data. In comparison to conventional classification techniques, proposed algorithms provide better performance in classification of normal brain tissues and significant small abnormalities.