867 resultados para least square-support vector machine
Resumo:
Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de Informação
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Mejora diagnóstica de hepatopatías de afectación difusa mediante técnicas de inteligencia artificial
Resumo:
The automatic diagnostic discrimination is an application of artificial intelligence techniques that can solve clinical cases based on imaging. Diffuse liver diseases are diseases of wide prominence in the population and insidious course, yet early in its progression. Early and effective diagnosis is necessary because many of these diseases progress to cirrhosis and liver cancer. The usual technique of choice for accurate diagnosis is liver biopsy, an invasive and not without incompatibilities one. It is proposed in this project an alternative non-invasive and free of contraindications method based on liver ultrasonography. The images are digitized and then analyzed using statistical techniques and analysis of texture. The results are validated from the pathology report. Finally, we apply artificial intelligence techniques as Fuzzy k-Means or Support Vector Machines and compare its significance to the analysis Statistics and the report of the clinician. The results show that this technique is significantly valid and a promising alternative as a noninvasive diagnostic chronic liver disease from diffuse involvement. Artificial Intelligence classifying techniques significantly improve the diagnosing discrimination compared to other statistics.
Resumo:
In this paper we study the relevance of multiple kernel learning (MKL) for the automatic selection of time series inputs. Recently, MKL has gained great attention in the machine learning community due to its flexibility in modelling complex patterns and performing feature selection. In general, MKL constructs the kernel as a weighted linear combination of basis kernels, exploiting different sources of information. An efficient algorithm wrapping a Support Vector Regression model for optimizing the MKL weights, named SimpleMKL, is used for the analysis. In this sense, MKL performs feature selection by discarding inputs/kernels with low or null weights. The approach proposed is tested with simulated linear and nonlinear time series (AutoRegressive, Henon and Lorenz series).
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.
Resumo:
This article presents an experimental study about the classification ability of several classifiers for multi-classclassification of cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland lawenforcement authorities regularly ask forensic laboratories to determinate the chemotype of a seized cannabisplant and then to conclude if the plantation is legal or not. This classification is mainly performed when theplant is mature as required by the EU official protocol and then the classification of cannabis seedlings is a timeconsuming and costly procedure. A previous study made by the authors has investigated this problematic [1]and showed that it is possible to differentiate between drug type (illegal) and fibre type (legal) cannabis at anearly stage of growth using gas chromatography interfaced with mass spectrometry (GC-MS) based on therelative proportions of eight major leaf compounds. The aims of the present work are on one hand to continueformer work and to optimize the methodology for the discrimination of drug- and fibre type cannabisdeveloped in the previous study and on the other hand to investigate the possibility to predict illegal cannabisvarieties. Seven classifiers for differentiating between cannabis seedlings are evaluated in this paper, namelyLinear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), Nearest NeighbourClassification (NNC), Learning Vector Quantization (LVQ), Radial Basis Function Support Vector Machines(RBF SVMs), Random Forest (RF) and Artificial Neural Networks (ANN). The performance of each method wasassessed using the same analytical dataset that consists of 861 samples split into drug- and fibre type cannabiswith drug type cannabis being made up of 12 varieties (i.e. 12 classes). The results show that linear classifiersare not able to manage the distribution of classes in which some overlap areas exist for both classificationproblems. Unlike linear classifiers, NNC and RBF SVMs best differentiate cannabis samples both for 2-class and12-class classifications with average classification results up to 99% and 98%, respectively. Furthermore, RBFSVMs correctly classified into drug type cannabis the independent validation set, which consists of cannabisplants coming from police seizures. In forensic case work this study shows that the discrimination betweencannabis samples at an early stage of growth is possible with fairly high classification performance fordiscriminating between cannabis chemotypes or between drug type cannabis varieties.
Resumo:
The present study deals with the analysis and mapping of Swiss franc interest rates. Interest rates depend on time and maturity, defining term structure of the interest rate curves (IRC). In the present study IRC are considered in a two-dimensional feature space - time and maturity. Exploratory data analysis includes a variety of tools widely used in econophysics and geostatistics. Geostatistical models and machine learning algorithms (multilayer perceptron and Support Vector Machines) were applied to produce interest rate maps. IR maps can be used for the visualisation and pattern perception purposes, to develop and to explore economical hypotheses, to produce dynamic asset-liability simulations and for financial risk assessments. The feasibility of an application of interest rates mapping approach for the IRC forecasting is considered as well. (C) 2008 Elsevier B.V. All rights reserved.
Resumo:
The quality of environmental data analysis and propagation of errors are heavily affected by the representativity of the initial sampling design [CRE 93, DEU 97, KAN 04a, LEN 06, MUL07]. Geostatistical methods such as kriging are related to field samples, whose spatial distribution is crucial for the correct detection of the phenomena. Literature about the design of environmental monitoring networks (MN) is widespread and several interesting books have recently been published [GRU 06, LEN 06, MUL 07] in order to clarify the basic principles of spatial sampling design (monitoring networks optimization) based on Support Vector Machines was proposed. Nonetheless, modelers often receive real data coming from environmental monitoring networks that suffer from problems of non-homogenity (clustering). Clustering can be related to the preferential sampling or to the impossibility of reaching certain regions.
Resumo:
Résumé : La radiothérapie par modulation d'intensité (IMRT) est une technique de traitement qui utilise des faisceaux dont la fluence de rayonnement est modulée. L'IMRT, largement utilisée dans les pays industrialisés, permet d'atteindre une meilleure homogénéité de la dose à l'intérieur du volume cible et de réduire la dose aux organes à risque. Une méthode usuelle pour réaliser pratiquement la modulation des faisceaux est de sommer de petits faisceaux (segments) qui ont la même incidence. Cette technique est appelée IMRT step-and-shoot. Dans le contexte clinique, il est nécessaire de vérifier les plans de traitement des patients avant la première irradiation. Cette question n'est toujours pas résolue de manière satisfaisante. En effet, un calcul indépendant des unités moniteur (représentatif de la pondération des chaque segment) ne peut pas être réalisé pour les traitements IMRT step-and-shoot, car les poids des segments ne sont pas connus à priori, mais calculés au moment de la planification inverse. Par ailleurs, la vérification des plans de traitement par comparaison avec des mesures prend du temps et ne restitue pas la géométrie exacte du traitement. Dans ce travail, une méthode indépendante de calcul des plans de traitement IMRT step-and-shoot est décrite. Cette méthode est basée sur le code Monte Carlo EGSnrc/BEAMnrc, dont la modélisation de la tête de l'accélérateur linéaire a été validée dans une large gamme de situations. Les segments d'un plan de traitement IMRT sont simulés individuellement dans la géométrie exacte du traitement. Ensuite, les distributions de dose sont converties en dose absorbée dans l'eau par unité moniteur. La dose totale du traitement dans chaque élément de volume du patient (voxel) peut être exprimée comme une équation matricielle linéaire des unités moniteur et de la dose par unité moniteur de chacun des faisceaux. La résolution de cette équation est effectuée par l'inversion d'une matrice à l'aide de l'algorithme dit Non-Negative Least Square fit (NNLS). L'ensemble des voxels contenus dans le volume patient ne pouvant être utilisés dans le calcul pour des raisons de limitations informatiques, plusieurs possibilités de sélection ont été testées. Le meilleur choix consiste à utiliser les voxels contenus dans le Volume Cible de Planification (PTV). La méthode proposée dans ce travail a été testée avec huit cas cliniques représentatifs des traitements habituels de radiothérapie. Les unités moniteur obtenues conduisent à des distributions de dose globale cliniquement équivalentes à celles issues du logiciel de planification des traitements. Ainsi, cette méthode indépendante de calcul des unités moniteur pour l'IMRT step-andshootest validée pour une utilisation clinique. Par analogie, il serait possible d'envisager d'appliquer une méthode similaire pour d'autres modalités de traitement comme par exemple la tomothérapie. Abstract : Intensity Modulated RadioTherapy (IMRT) is a treatment technique that uses modulated beam fluence. IMRT is now widespread in more advanced countries, due to its improvement of dose conformation around target volume, and its ability to lower doses to organs at risk in complex clinical cases. One way to carry out beam modulation is to sum smaller beams (beamlets) with the same incidence. This technique is called step-and-shoot IMRT. In a clinical context, it is necessary to verify treatment plans before the first irradiation. IMRT Plan verification is still an issue for this technique. Independent monitor unit calculation (representative of the weight of each beamlet) can indeed not be performed for IMRT step-and-shoot, because beamlet weights are not known a priori, but calculated by inverse planning. Besides, treatment plan verification by comparison with measured data is time consuming and performed in a simple geometry, usually in a cubic water phantom with all machine angles set to zero. In this work, an independent method for monitor unit calculation for step-and-shoot IMRT is described. This method is based on the Monte Carlo code EGSnrc/BEAMnrc. The Monte Carlo model of the head of the linear accelerator is validated by comparison of simulated and measured dose distributions in a large range of situations. The beamlets of an IMRT treatment plan are calculated individually by Monte Carlo, in the exact geometry of the treatment. Then, the dose distributions of the beamlets are converted in absorbed dose to water per monitor unit. The dose of the whole treatment in each volume element (voxel) can be expressed through a linear matrix equation of the monitor units and dose per monitor unit of every beamlets. This equation is solved by a Non-Negative Least Sqvare fif algorithm (NNLS). However, not every voxels inside the patient volume can be used in order to solve this equation, because of computer limitations. Several ways of voxel selection have been tested and the best choice consists in using voxels inside the Planning Target Volume (PTV). The method presented in this work was tested with eight clinical cases, which were representative of usual radiotherapy treatments. The monitor units obtained lead to clinically equivalent global dose distributions. Thus, this independent monitor unit calculation method for step-and-shoot IMRT is validated and can therefore be used in a clinical routine. It would be possible to consider applying a similar method for other treatment modalities, such as for instance tomotherapy or volumetric modulated arc therapy.
Resumo:
In this present work, we are proposing a characteristics reduction system for a facial biometric identification system, using transformed domains such as discrete cosine transformed (DCT) and discrete wavelets transformed (DWT) as parameterization; and Support Vector Machines (SVM) and Neural Network (NN) as classifiers. The size reduction has been done with Principal Component Analysis (PCA) and with Independent Component Analysis (ICA). This system presents a similar success results for both DWT-SVM system and DWT-PCA-SVM system, about 98%. The computational load is improved on training mode due to the decreasing of input’s size and less complexity of the classifier.
Resumo:
Huolimatta korkeasta automaatioasteesta sorvausteollisuudessa, muutama keskeinen ongelma estää sorvauksen täydellisen automatisoinnin. Yksi näistä ongelmista on työkalun kuluminen. Tämä työ keskittyy toteuttamaan automaattisen järjestelmän kulumisen, erityisesti viistekulumisen, mittaukseen konenäön avulla. Kulumisen mittausjärjestelmä poistaa manuaalisen mittauksen tarpeen ja minimoi ajan, joka käytetään työkalun kulumisen mittaukseen. Mittauksen lisäksi tutkitaan kulumisen mallinnusta sekä ennustamista. Automaattinen mittausjärjestelmä sijoitettiin sorvin sisälle ja järjestelmä integroitiin onnistuneesti ulkopuolisten järjestelmien kanssa. Tehdyt kokeet osoittivat, että mittausjärjestelmä kykenee mittaamaan työkalun kulumisen järjestelmän oikeassa ympäristössä. Mittausjärjestelmä pystyy myös kestämään häiriöitä, jotka ovat konenäköjärjestelmille yleisiä. Työkalun kulumista mallinnusta tutkittiin useilla eri menetelmillä. Näihin kuuluivat muiden muassa neuroverkot ja tukivektoriregressio. Kokeet osoittivat, että tutkitut mallit pystyivät ennustamaan työkalun kulumisasteen käytetyn ajan perusteella. Parhaan tuloksen antoivat neuroverkot Bayesiläisellä regularisoinnilla.
Resumo:
Acetylation was performed to reduce the polarity of wood and increase its compatibility with polymer matrices for the production of composites. These reactions were performed first as a function of acetic acid and anhydride concentration in a mixture catalyzed by sulfuric acid. A concentration of 50%/50% (v/v) of acetic acid and anhydride was found to produced the highest conversion rate between the functional groups. After these reactions, the kinetics were investigated by varying times and temperatures using a 3² factorial design, and showed time was the most relevant parameter in determining the conversion of hydroxyl into carbonyl groups.
Resumo:
. The influence of vine water status was studied in commercial vineyard blocks of Vilis vinifera L. cv. Cabernet Franc in Niagara Peninsula, Ontario from 2005 to 2007. Vine performance, fruit composition and vine size of non-irrigated grapevines were compared within ten vineyard blocks containing different soil and vine water status. Results showed that within each vineyard block water status zones could be identified on GIS-generated maps using leaf water potential and soil moisture measurements. Some yield and fruit composition variables correlated with the intensity of vine water status. Chemical and descriptive sensory analysis was performed on nine (2005) and eight (2006) pairs of experimental wines to illustrate differences between wines made from high and low water status winegrapes at each vineyard block. Twelve trained judges evaluated six aroma and flavor (red fruit, black cherry, black current, black pepper, bell pepper, and green bean), thr~e mouthfeel (astringency, bitterness and acidity) sensory attributes as well as color intensity. Each pair of high and low water status wine was compared using t-test. In 2005, low water status (L WS) wines from Buis, Harbour Estate, Henry of Pelham (HOP), and Vieni had higher color intensity; those form Chateau des Charmes (CDC) had high black cherry flavor; those at RiefEstates were high in red fruit flavor and at those from George site was high in red fruit aroma. In 2006, low water status (L WS) wines from George, Cave Spring and Morrison sites were high in color intensity. L WS wines from CDC, George and Morrison were more intense in black cherry aroma; LWS wines from Hernder site were high in red fruit aroma and flavor. No significant differences were found from one year to the next between the wines produced from the same vineyard, indicating that the attributes of these wines were maintained almost constant despite markedly different conditions in 2005 and 2006 vintages. Partial ii Least Square (PLS) analysis showed that leaf \}' was associated with red fruit aroma and flavor, berry and wine color intensity, total phenols, Brix and anthocyanins while soil moisture was explained with acidity, green bean aroma and flavor as well as bell pepper aroma and flavor. In another study chemical and descriptive sensory analysis was conducted on nine (2005) and eight (2006) medium water status (MWS) experimental wines to illustrate differences that might support the sub-appellation system in Niagara. The judges evaluated the same aroma, flavor, and mouthfeel sensory attributes as well as color intensity. Data were analyzed using analysis of variance (ANOVA), principal component analysis (PCA) and discriminate analysis (DA). ANOV A of sensory data showed regional differences for all sensory attributes. In 2005, wines from CDC, HOP, and Hemder sites showed highest. r ed fruit aroma and flavor. Lakeshore and Niagara River sites (Harbour, Reif, George, and Buis) wines showed higher bell pepper and green bean aroma and flavor due to proximity to the large bodies of water and less heat unit accumulation. In 2006, all sensory attributes except black pepper aroma were different. PCA revealed that wines from HOP and CDC sites were higher in red fruit, black currant and black cherry aroma and flavor as well as black pepper flavor, while wines from Hemder, Morrison and George sites were high in green bean aroma and flavor. ANOV A of chemical data in 2005 indicated that hue, color intensity, and titratable acidity (TA) were different across the sites, while in 2006, hue, color intensity and ethanol were different across the sites. These data indicate that there is the likelihood of substantial chemical and sensory differences between clusters of sub-appellations within the Niagara Peninsula iii
Resumo:
This is a Named Entity Based Question Answering System for Malayalam Language. Although a vast amount of information is available today in digital form, no effective information access mechanism exists to provide humans with convenient information access. Information Retrieval and Question Answering systems are the two mechanisms available now for information access. Information systems typically return a long list of documents in response to a user’s query which are to be skimmed by the user to determine whether they contain an answer. But a Question Answering System allows the user to state his/her information need as a natural language question and receives most appropriate answer in a word or a sentence or a paragraph. This system is based on Named Entity Tagging and Question Classification. Document tagging extracts useful information from the documents which will be used in finding the answer to the question. Question Classification extracts useful information from the question to determine the type of the question and the way in which the question is to be answered. Various Machine Learning methods are used to tag the documents. Rule-Based Approach is used for Question Classification. Malayalam belongs to the Dravidian family of languages and is one of the four major languages of this family. It is one of the 22 Scheduled Languages of India with official language status in the state of Kerala. It is spoken by 40 million people. Malayalam is a morphologically rich agglutinative language and relatively of free word order. Also Malayalam has a productive morphology that allows the creation of complex words which are often highly ambiguous. Document tagging tools such as Parts-of-Speech Tagger, Phrase Chunker, Named Entity Tagger, and Compound Word Splitter are developed as a part of this research work. No such tools were available for Malayalam language. Finite State Transducer, High Order Conditional Random Field, Artificial Immunity System Principles, and Support Vector Machines are the techniques used for the design of these document preprocessing tools. This research work describes how the Named Entity is used to represent the documents. Single sentence questions are used to test the system. Overall Precision and Recall obtained are 88.5% and 85.9% respectively. This work can be extended in several directions. The coverage of non-factoid questions can be increased and also it can be extended to include open domain applications. Reference Resolution and Word Sense Disambiguation techniques are suggested as the future enhancements