896 resultados para Feature Descriptors
Resumo:
Developing software is a difficult and error-prone activity. Furthermore, the complexity of modern computer applications is significant. Hence,an organised approach to software construction is crucial. Stepwise Feature Introduction – created by R.-J. Back – is a development paradigm, in which software is constructed by adding functionality in small increments. The resulting code has an organised, layered structure and can be easily reused. Moreover, the interaction with the users of the software and the correctness concerns are essential elements of the development process, contributing to high quality and functionality of the final product. The paradigm of Stepwise Feature Introduction has been successfully applied in an academic environment, to a number of small-scale developments. The thesis examines the paradigm and its suitability to construction of large and complex software systems by focusing on the development of two software systems of significant complexity. Throughout the thesis we propose a number of improvements and modifications that should be applied to the paradigm when developing or reengineering large and complex software systems. The discussion in the thesis covers various aspects of software development that relate to Stepwise Feature Introduction. More specifically, we evaluate the paradigm based on the common practices of object-oriented programming and design and agile development methodologies. We also outline the strategy to testing systems built with the paradigm of Stepwise Feature Introduction.
Resumo:
In this paper a computer program to model and support product design is presented. The product is represented through a hierarchical structure that allows the user to navigate across the products components, and it aims at facilitating each step of the detail design process. A graphical interface was also developed, which shows visually to the user the contents of the product structure. Features are used as building blocks for the parts that compose the product, and object-oriented methodology was used as a means to implement the product structure. Finally, an expert system was also implemented, whose knowledge base rules help the user design a product that meets design and manufacturing requirements.
Resumo:
Adrenocortical tumors (ACT) in children under 15 years of age exhibit some clinical and biological features distinct from ACT in adults. Cell proliferation, hypertrophy and cell death in adrenal cortex during the last months of gestation and the immediate postnatal period seem to be critical for the origin of ACT in children. Studies with large numbers of patients with childhood ACT have indicated a median age at diagnosis of about 4 years. In our institution, the median age was 3 years and 5 months, while the median age for first signs and symptoms was 2 years and 5 months (N = 72). Using the comparative genomic hybridization technique, we have reported a high frequency of 9q34 amplification in adenomas and carcinomas. This finding has been confirmed more recently by investigators in England. The lower socioeconomic status, the distinctive ethnic groups and all the regional differences in Southern Brazil in relation to patients in England indicate that these differences are not important to determine 9q34 amplification. Candidate amplified genes mapped to this locus are currently being investigated and Southern blot results obtained so far have discarded amplification of the abl oncogene. Amplification of 9q34 has not been found to be related to tumor size, staging, or malignant histopathological features, nor does it seem to be responsible for the higher incidence of ACT observed in Southern Brazil, but could be related to an ACT from embryonic origin.
Resumo:
Feature extraction is the part of pattern recognition, where the sensor data is transformed into a more suitable form for the machine to interpret. The purpose of this step is also to reduce the amount of information passed to the next stages of the system, and to preserve the essential information in the view of discriminating the data into different classes. For instance, in the case of image analysis the actual image intensities are vulnerable to various environmental effects, such as lighting changes and the feature extraction can be used as means for detecting features, which are invariant to certain types of illumination changes. Finally, classification tries to make decisions based on the previously transformed data. The main focus of this thesis is on developing new methods for the embedded feature extraction based on local non-parametric image descriptors. Also, feature analysis is carried out for the selected image features. Low-level Local Binary Pattern (LBP) based features are in a main role in the analysis. In the embedded domain, the pattern recognition system must usually meet strict performance constraints, such as high speed, compact size and low power consumption. The characteristics of the final system can be seen as a trade-off between these metrics, which is largely affected by the decisions made during the implementation phase. The implementation alternatives of the LBP based feature extraction are explored in the embedded domain in the context of focal-plane vision processors. In particular, the thesis demonstrates the LBP extraction with MIPA4k massively parallel focal-plane processor IC. Also higher level processing is incorporated to this framework, by means of a framework for implementing a single chip face recognition system. Furthermore, a new method for determining optical flow based on LBPs, designed in particular to the embedded domain is presented. Inspired by some of the principles observed through the feature analysis of the Local Binary Patterns, an extension to the well known non-parametric rank transform is proposed, and its performance is evaluated in face recognition experiments with a standard dataset. Finally, an a priori model where the LBPs are seen as combinations of n-tuples is also presented
Resumo:
In a serial feature-positive conditional discrimination procedure the properties of a target stimulus A are defined by the presence or not of a feature stimulus X preceding it. In the present experiment, composite features preceded targets associated with two different topography operant responses (right and left bar pressing); matching and non-matching-to-sample arrangements were also used. Five water-deprived Wistar rats were trained in 6 different trials: X-R®Ar and X-L®Al, in which X and A were same modality visual stimuli and the reinforcement was contingent to pressing either the right (r) or left (l) bar that had the light on during the feature (matching-to-sample); Y-R®Bl and Y-L®Br, in which Y and B were same modality auditory stimuli and the reinforcement was contingent to pressing the bar that had the light off during the feature (non-matching-to-sample); A- and B- alone. After 100 training sessions, the animals were submitted to transfer tests with the targets used plus a new one (auditory click). Average percentages of stimuli with a response were measured. Acquisition occurred completely only for Y-L®Br+; however, complex associations were established along training. Transfer was not complete during the tests since concurrent effects of extinction and response generalization also occurred. Results suggest the use of both simple conditioning and configurational strategies, favoring the most recent theories of conditional discrimination learning. The implications of the use of complex arrangements for discussing these theories are considered.
Resumo:
Personalized medicine will revolutionize our capabilities to combat disease. Working toward this goal, a fundamental task is the deciphering of geneticvariants that are predictive of complex diseases. Modern studies, in the formof genome-wide association studies (GWAS) have afforded researchers with the opportunity to reveal new genotype-phenotype relationships through the extensive scanning of genetic variants. These studies typically contain over half a million genetic features for thousands of individuals. Examining this with methods other than univariate statistics is a challenging task requiring advanced algorithms that are scalable to the genome-wide level. In the future, next-generation sequencing studies (NGS) will contain an even larger number of common and rare variants. Machine learning-based feature selection algorithms have been shown to have the ability to effectively create predictive models for various genotype-phenotype relationships. This work explores the problem of selecting genetic variant subsets that are the most predictive of complex disease phenotypes through various feature selection methodologies, including filter, wrapper and embedded algorithms. The examined machine learning algorithms were demonstrated to not only be effective at predicting the disease phenotypes, but also doing so efficiently through the use of computational shortcuts. While much of the work was able to be run on high-end desktops, some work was further extended so that it could be implemented on parallel computers helping to assure that they will also scale to the NGS data sets. Further, these studies analyzed the relationships between various feature selection methods and demonstrated the need for careful testing when selecting an algorithm. It was shown that there is no universally optimal algorithm for variant selection in GWAS, but rather methodologies need to be selected based on the desired outcome, such as the number of features to be included in the prediction model. It was also demonstrated that without proper model validation, for example using nested cross-validation, the models can result in overly-optimistic prediction accuracies and decreased generalization ability. It is through the implementation and application of machine learning methods that one can extract predictive genotype–phenotype relationships and biological insights from genetic data sets.
Resumo:
The odor and taste profile of cocoa bean samples obtained from trees cultivated in southern Mexico were evaluated by trained panelists. Seven representative samples (groups) of a total of 45 were analyzed. Four attributes of taste (sweetness, bitterness, acidity and astringency), and nine of odor (chocolate, nutty, hazelnut, sweet, acidity, roasted, spicy, musty and off-odor) were evaluated. A sample (G7) with higher scores in sweet taste and sweet and nutty odors was detected, as well as a high association between these descriptors and the sample, analyzed through principal component analysis (PCA). Similarly, samples that showed high scores for non-desired odors in cocoas such as off-odor and musty were identified and related by PCA to roasted odor and astringent taste (G2 and G4). Based on this scores, the samples were listed in descending order by their sensory quality as G7> G5> G6> G3> G1> G4> G2.
Resumo:
Tässä työssä testattiin partikkelikokojakaumien analysoinnissa käytettävää kuvankäsittelyohjelmaa INCA Feature. Partikkelikokojakaumat määritettiin elektronimikroskooppikuvista INCA Feature ohjelmaa käyttäen partikkeleiden projektiokuvista päällystyspigmenttinä käytettävälle talkille ja kahdelle eri karbonaattilaadulle. Lisäksi määritettiin partikkelikokojakaumat suodatuksessa ja puhdistuksessa apuaineina käytettäville piidioksidi- ja alumiinioksidihiukkasille. Kuvankäsittelyohjelmalla määritettyjä partikkelikokojakaumia verrattiin partikkelin laskeutumisnopeuteen eli sedimentaatioon perustuvalla SediGraph 5100 analysaattorilla ja laserdiffraktioon perustuvalla Coulter LS 230 menetelmällä analysoituihin partikkelikokojakaumiin. SediGraph 5100 ja kuva-analyysiohjelma antoivat talkkipartikkelien kokojakaumalle hyvin samankaltaisen keskiarvon. Sen sijaan Coulter LS 230 laitteen antama kokojakauman keskiarvo poikkesi edellisistä. Kaikki vertailussa olleet partikkelikokojakaumamenetelmät asettivat eri näytteiden partikkelit samaan kokojärjestykseen. Kuitenkaan menetelmien tuloksia ei voida numeerisesti verrata toisiinsa, sillä kaikissa käytetyissä analyysimenetelmissä partikkelikoon mittaus perustuu partikkelin eri ominaisuuteen. Työn perusteella kaikki testatut analyysimenetelmät soveltuvat paperipigmenttien partikkelikokojakaumien määrittämiseen. Tässä työssä selvitettiin myös kuva-analyysiin tarvittava partikkelien lukumäärä, jolla analyysitulos on luotettava. Työssä todettiin, että analysoitavien partikkelien lukumäärän tulee olla vähintään 300 partikkelia. Liian suuri näytemäärä lisää kokojakauman hajontaa ja pidentää analyysiin käytettyä aikaa useaan tuntiin. Näytteenkäsittely vaatii vielä lisää tutkimuksia, sillä se on tärkein ja kriittisin vaihe SEM ja kuva-analyysiohjelmalla tehtävää partikkelikokoanalyysiä. Automaattisten mikroskooppien yleistyminen helpottaa ja nopeuttaa analyysien tekoa, jolloin menetelmän suosio tulee kasvamaan myös paperipigmenttien tutkimuksessa. Laitteiden korkea hinta ja käyttäjältä vaadittava eritysosaaminen tulevat rajaamaan käytön ainakin toistaiseksi tutkimuslaitoksiin.
Resumo:
A feature-based fitness function is applied in a genetic programming system to synthesize stochastic gene regulatory network models whose behaviour is defined by a time course of protein expression levels. Typically, when targeting time series data, the fitness function is based on a sum-of-errors involving the values of the fluctuating signal. While this approach is successful in many instances, its performance can deteriorate in the presence of noise. This thesis explores a fitness measure determined from a set of statistical features characterizing the time series' sequence of values, rather than the actual values themselves. Through a series of experiments involving symbolic regression with added noise and gene regulatory network models based on the stochastic 'if-calculus, it is shown to successfully target oscillating and non-oscillating signals. This practical and versatile fitness function offers an alternate approach, worthy of consideration for use in algorithms that evaluate noisy or stochastic behaviour.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
New Feature at Niagara – Clark Hill Islands (5 islands situated in the rapids of the Niagara River). These islands are currently known as Dufferin Islands, 22 ½ cm. x 15 ½ cm, n.d.
Resumo:
L’apprentissage supervisé de réseaux hiérarchiques à grande échelle connaît présentement un succès fulgurant. Malgré cette effervescence, l’apprentissage non-supervisé représente toujours, selon plusieurs chercheurs, un élément clé de l’Intelligence Artificielle, où les agents doivent apprendre à partir d’un nombre potentiellement limité de données. Cette thèse s’inscrit dans cette pensée et aborde divers sujets de recherche liés au problème d’estimation de densité par l’entremise des machines de Boltzmann (BM), modèles graphiques probabilistes au coeur de l’apprentissage profond. Nos contributions touchent les domaines de l’échantillonnage, l’estimation de fonctions de partition, l’optimisation ainsi que l’apprentissage de représentations invariantes. Cette thèse débute par l’exposition d’un nouvel algorithme d'échantillonnage adaptatif, qui ajuste (de fa ̧con automatique) la température des chaînes de Markov sous simulation, afin de maintenir une vitesse de convergence élevée tout au long de l’apprentissage. Lorsqu’utilisé dans le contexte de l’apprentissage par maximum de vraisemblance stochastique (SML), notre algorithme engendre une robustesse accrue face à la sélection du taux d’apprentissage, ainsi qu’une meilleure vitesse de convergence. Nos résultats sont présent ́es dans le domaine des BMs, mais la méthode est générale et applicable à l’apprentissage de tout modèle probabiliste exploitant l’échantillonnage par chaînes de Markov. Tandis que le gradient du maximum de vraisemblance peut-être approximé par échantillonnage, l’évaluation de la log-vraisemblance nécessite un estimé de la fonction de partition. Contrairement aux approches traditionnelles qui considèrent un modèle donné comme une boîte noire, nous proposons plutôt d’exploiter la dynamique de l’apprentissage en estimant les changements successifs de log-partition encourus à chaque mise à jour des paramètres. Le problème d’estimation est reformulé comme un problème d’inférence similaire au filtre de Kalman, mais sur un graphe bi-dimensionnel, où les dimensions correspondent aux axes du temps et au paramètre de température. Sur le thème de l’optimisation, nous présentons également un algorithme permettant d’appliquer, de manière efficace, le gradient naturel à des machines de Boltzmann comportant des milliers d’unités. Jusqu’à présent, son adoption était limitée par son haut coût computationel ainsi que sa demande en mémoire. Notre algorithme, Metric-Free Natural Gradient (MFNG), permet d’éviter le calcul explicite de la matrice d’information de Fisher (et son inverse) en exploitant un solveur linéaire combiné à un produit matrice-vecteur efficace. L’algorithme est prometteur: en terme du nombre d’évaluations de fonctions, MFNG converge plus rapidement que SML. Son implémentation demeure malheureusement inefficace en temps de calcul. Ces travaux explorent également les mécanismes sous-jacents à l’apprentissage de représentations invariantes. À cette fin, nous utilisons la famille de machines de Boltzmann restreintes “spike & slab” (ssRBM), que nous modifions afin de pouvoir modéliser des distributions binaires et parcimonieuses. Les variables latentes binaires de la ssRBM peuvent être rendues invariantes à un sous-espace vectoriel, en associant à chacune d’elles, un vecteur de variables latentes continues (dénommées “slabs”). Ceci se traduit par une invariance accrue au niveau de la représentation et un meilleur taux de classification lorsque peu de données étiquetées sont disponibles. Nous terminons cette thèse sur un sujet ambitieux: l’apprentissage de représentations pouvant séparer les facteurs de variations présents dans le signal d’entrée. Nous proposons une solution à base de ssRBM bilinéaire (avec deux groupes de facteurs latents) et formulons le problème comme l’un de “pooling” dans des sous-espaces vectoriels complémentaires.
Resumo:
The wealth of information available freely on the web and medical image databases poses a major problem for the end users: how to find the information needed? Content –Based Image Retrieval is the obvious solution.A standard called MPEG-7 was evolved to address the interoperability issues of content-based search.The work presented in this thesis mainly concentrates on developing new shape descriptors and a framework for content – based retrieval of scoliosis images.New region-based and contour based shape descriptor is developed based on orthogonal Legendre polymomials.A novel system for indexing and retrieval of digital spine radiographs with scoliosis is presented.
Resumo:
Magnetic Resonance Imaging (MRI) is a multi sequence medical imaging technique in which stacks of images are acquired with different tissue contrasts. Simultaneous observation and quantitative analysis of normal brain tissues and small abnormalities from these large numbers of different sequences is a great challenge in clinical applications. Multispectral MRI analysis can simplify the job considerably by combining unlimited number of available co-registered sequences in a single suite. However, poor performance of the multispectral system with conventional image classification and segmentation methods makes it inappropriate for clinical analysis. Recent works in multispectral brain MRI analysis attempted to resolve this issue by improved feature extraction approaches, such as transform based methods, fuzzy approaches, algebraic techniques and so forth. Transform based feature extraction methods like Independent Component Analysis (ICA) and its extensions have been effectively used in recent studies to improve the performance of multispectral brain MRI analysis. However, these global transforms were found to be inefficient and inconsistent in identifying less frequently occurred features like small lesions, from large amount of MR data. The present thesis focuses on the improvement in ICA based feature extraction techniques to enhance the performance of multispectral brain MRI analysis. Methods using spectral clustering and wavelet transforms are proposed to resolve the inefficiency of ICA in identifying small abnormalities, and problems due to ICA over-completeness. Effectiveness of the new methods in brain tissue classification and segmentation is confirmed by a detailed quantitative and qualitative analysis with synthetic and clinical, normal and abnormal, data. In comparison to conventional classification techniques, proposed algorithms provide better performance in classification of normal brain tissues and significant small abnormalities.