778 resultados para machine learning modelli lineari missing data biomarcatori
Resumo:
There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.
Resumo:
Establishing metrics to assess machine translation (MT) systems automatically is now crucial owing to the widespread use of MT over the web. In this study we show that such evaluation can be done by modeling text as complex networks. Specifically, we extend our previous work by employing additional metrics of complex networks, whose results were used as input for machine learning methods and allowed MT texts of distinct qualities to be distinguished. Also shown is that the node-to-node mapping between source and target texts (English-Portuguese and Spanish-Portuguese pairs) can be improved by adding further hierarchical levels for the metrics out-degree, in-degree, hierarchical common degree, cluster coefficient, inter-ring degree, intra-ring degree and convergence ratio. The results presented here amount to a proof-of-principle that the possible capturing of a wider context with the hierarchical levels may be combined with machine learning methods to yield an approach for assessing the quality of MT systems. (C) 2010 Elsevier B.V. All rights reserved.
Resumo:
Felice Gigante a graduate from the New York Trade School Electronics program works on a machine in his job as Data Processing Customer Engineer for the International Business Machines Corp. Original caption reads, "Felice Gigante - Electronices, International Business Machines Corp." Black and white photograph with caption glued to reverse.
Resumo:
Intelligent Transportation System (ITS) is a system that builds a safe, effective and integrated transportation environment based on advanced technologies. Road signs detection and recognition is an important part of ITS, which offer ways to collect the real time traffic data for processing at a central facility.This project is to implement a road sign recognition model based on AI and image analysis technologies, which applies a machine learning method, Support Vector Machines, to recognize road signs. We focus on recognizing seven categories of road sign shapes and five categories of speed limit signs. Two kinds of features, binary image and Zernike moments, are used for representing the data to the SVM for training and test. We compared and analyzed the performances of SVM recognition model using different features and different kernels. Moreover, the performances using different recognition models, SVM and Fuzzy ARTMAP, are observed.
Resumo:
Objective: To investigate whether spirography-based objective measures are able to effectively characterize the severity of unwanted symptom states (Off and dyskinesia) and discriminate them from motor state of healthy elderly subjects. Background: Sixty-five patients with advanced Parkinson’s disease (PD) and 10 healthy elderly (HE) subjects performed repeated assessments of spirography, using a touch screen telemetry device in their home environments. On inclusion, the patients were either treated with levodopa-carbidopa intestinal gel or were candidates for switching to this treatment. On each test occasion, the subjects were asked trace a pre-drawn Archimedes spiral shown on the screen, using an ergonomic pen stylus. The test was repeated three times and was performed using dominant hand. A clinician used a web interface which animated the spiral drawings, allowing him to observe different kinematic features, like accelerations and spatial changes, during the drawing process and to rate different motor impairments. Initially, the motor impairments of drawing speed, irregularity and hesitation were rated on a 0 (normal) to 4 (extremely severe) scales followed by marking the momentary motor state of the patient into 2 categories that is Off and Dyskinesia. A sample of spirals drawn by HE subjects was randomly selected and used in subsequent analysis. Methods: The raw spiral data, consisting of stylus position and timestamp, were processed using time series analysis techniques like discrete wavelet transform, approximate entropy and dynamic time warping in order to extract 13 quantitative measures for representing meaningful motor impairment information. A principal component analysis (PCA) was used to reduce the dimensions of the quantitative measures into 4 principal components (PC). In order to classify the motor states into 3 categories that is Off, HE and dyskinesia, a logistic regression model was used as a classifier to map the 4 PCs to the corresponding clinically assigned motor state categories. A stratified 10-fold cross-validation (also known as rotation estimation) was applied to assess the generalization ability of the logistic regression classifier to future independent data sets. To investigate mean differences of the 4 PCs across the three categories, a one-way ANOVA test followed by Tukey multiple comparisons was used. Results: The agreements between computed and clinician ratings were very good with a weighted area under the receiver operating characteristic curve (AUC) coefficient of 0.91. The mean PC scores were different across the three motor state categories, only at different levels. The first 2 PCs were good at discriminating between the motor states whereas the PC3 was good at discriminating between HE subjects and PD patients. The mean scores of PC4 showed a trend across the three states but without significant differences. The Spearman’s rank correlations between the first 2 PCs and clinically assessed motor impairments were as follows: drawing speed (PC1, 0.34; PC2, 0.83), irregularity (PC1, 0.17; PC2, 0.17), and hesitation (PC1, 0.27; PC2, 0.77). Conclusions: These findings suggest that spirography-based objective measures are valid measures of spatial- and time-dependent deficits and can be used to distinguish drug-related motor dysfunctions between Off and dyskinesia in PD. These measures can be potentially useful during clinical evaluation of individualized drug-related complications such as over- and under-medications thus maximizing the amount of time the patients spend in the On state.
Resumo:
Objective: To investigate whether advanced visualizations of spirography-based objective measures are useful in differentiating drug-related motor dysfunctions between Off and dyskinesia in Parkinson’s disease (PD). Background: During the course of a 3 year longitudinal clinical study, in total 65 patients (43 males and 22 females with mean age of 65) with advanced PD and 10 healthy elderly (HE) subjects (5 males and 5 females with mean age of 61) were assessed. Both patients and HE subjects performed repeated and time-stamped assessments of their objective health indicators using a test battery implemented on a telemetry touch screen handheld computer, in their home environment settings. Among other tasks, the subjects were asked to trace a pre-drawn Archimedes spiral using the dominant hand and repeat the test three times per test occasion. Methods: A web-based framework was developed to enable a visual exploration of relevant spirography-based kinematic features by clinicians so they can in turn evaluate the motor states of the patients i.e. Off and dyskinesia. The system uses different visualization techniques such as time series plots, animation, and interaction and organizes them into different views to aid clinicians in measuring spatial and time-dependent irregularities that could be associated with the motor states. Along with the animation view, the system displays two time series plots for representing drawing speed (blue line) and displacement from ideal trajectory (orange line). The views are coordinated and linked i.e. user interactions in one of the views will be reflected in other views. For instance, when the user points in one of the pixels in the spiral view, the circle size of the underlying pixel increases and a vertical line appears in the time series views to depict the corresponding position. In addition, in order to enable clinicians to observe erratic movements more clearly and thus improve the detection of irregularities, the system displays a color-map which gives an idea of the longevity of the spirography task. Figure 2 shows single randomly selected spirals drawn by a: A) patient who experienced dyskinesias, B) HE subject, and C) patient in Off state. Results: According to a domain expert (DN), the spirals drawn in the Off and dyskinesia motor states are characterized by different spatial and time features. For instance, the spiral shown in Fig. 2A was drawn by a patient who showed symptoms of dyskinesia; the drawing speed was relatively high (cf. blue-colored time series plot and the short timestamp scale in the x axis) and the spatial displacement was high (cf. orange-colored time series plot) associated with smooth deviations as a result of uncontrollable movements. The patient also exhibited low amount of hesitation which could be reflected both in the animation of the spiral as well as time series plots. In contrast, the patient who was in the Off state exhibited different kinematic features, as shown in Fig. 2C. In the case of spirals drawn by a HE subject, there was a great precision during the drawing process as well as unchanging levels of time-dependent features over the test trial, as seen in Fig. 2B. Conclusions: Visualizing spirography-based objective measures enables identification of trends and patterns of drug-related motor dysfunctions at the patient’s individual level. Dynamic access of visualized motor tests may be useful during the evaluation of drug-related complications such as under- and over-medications, providing decision support to clinicians during evaluation of treatment effects as well as improve the quality of life of patients and their caregivers. In future, we plan to evaluate the proposed approach by assessing within- and between-clinician variability in ratings in order to determine its actual usefulness and then use these ratings as target outcomes in supervised machine learning, similarly as it was previously done in the study performed by Memedi et al. (2013).
Resumo:
A challenge for the clinical management of advanced Parkinson’s disease (PD) patients is the emergence of fluctuations in motor performance, which represents a significant source of disability during activities of daily living of the patients. There is a lack of objective measurement of treatment effects for in-clinic and at-home use that can provide an overview of the treatment response. The objective of this paper was to develop a method for objective quantification of advanced PD motor symptoms related to off episodes and peak dose dyskinesia, using spiral data gathered by a touch screen telemetry device. More specifically, the aim was to objectively characterize motor symptoms (bradykinesia and dyskinesia), to help in automating the process of visual interpretation of movement anomalies in spirals as rated by movement disorder specialists. Digitized upper limb movement data of 65 advanced PD patients and 10 healthy (HE) subjects were recorded as they performed spiral drawing tasks on a touch screen device in their home environment settings. Several spatiotemporal features were extracted from the time series and used as inputs to machine learning methods. The methods were validated against ratings on animated spirals scored by four movement disorder specialists who visually assessed a set of kinematic features and the motor symptom. The ability of the method to discriminate between PD patients and HE subjects and the test-retest reliability of the computed scores were also evaluated. Computed scores correlated well with mean visual ratings of individual kinematic features. The best performing classifier (Multilayer Perceptron) classified the motor symptom (bradykinesia or dyskinesia) with an accuracy of 84% and area under the receiver operating characteristics curve of 0.86 in relation to visual classifications of the raters. In addition, the method provided high discriminating power when distinguishing between PD patients and HE subjects as well as had good test-retest reliability. This study demonstrated the potential of using digital spiral analysis for objective quantification of PD-specific and/or treatment-induced motor symptoms.
Resumo:
O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.
Resumo:
Multi-factor models constitute a useful tool to explain cross-sectional covariance in equities returns. We propose in this paper the use of irregularly spaced returns in the multi-factor model estimation and provide an empirical example with the 389 most liquid equities in the Brazilian Market. The market index shows itself significant to explain equity returns while the US$/Brazilian Real exchange rate and the Brazilian standard interest rate does not. This example shows the usefulness of the estimation method in further using the model to fill in missing values and to provide interval forecasts.
Resumo:
Multi-factor models constitute a use fui tool to explain cross-sectional covariance in equities retums. We propose in this paper the use of irregularly spaced returns in the multi-factor model estimation and provide an empirical example with the 389 most liquid equities in the Brazilian Market. The market index shows itself significant to explain equity returns while the US$/Brazilian Real exchange rate and the Brazilian standard interest rate does not. This example shows the usefulness of the estimation method in further using the model to fill in missing values and to provide intervaI forecasts.
Resumo:
The Support Vector Machines (SVM) has attracted increasing attention in machine learning area, particularly on classification and patterns recognition. However, in some cases it is not easy to determinate accurately the class which given pattern belongs. This thesis involves the construction of a intervalar pattern classifier using SVM in association with intervalar theory, in order to model the separation of a pattern set between distinct classes with precision, aiming to obtain an optimized separation capable to treat imprecisions contained in the initial data and generated during the computational processing. The SVM is a linear machine. In order to allow it to solve real-world problems (usually nonlinear problems), it is necessary to treat the pattern set, know as input set, transforming from nonlinear nature to linear problem. The kernel machines are responsible to do this mapping. To create the intervalar extension of SVM, both for linear and nonlinear problems, it was necessary define intervalar kernel and the Mercer s theorem (which caracterize a kernel function) to intervalar function
Resumo:
Equipment maintenance is the major cost factor in industrial plants, it is very important the development of fault predict techniques. Three-phase induction motors are key electrical equipments used in industrial applications mainly because presents low cost and large robustness, however, it isn t protected from other fault types such as shorted winding and broken bars. Several acquisition ways, processing and signal analysis are applied to improve its diagnosis. More efficient techniques use current sensors and its signature analysis. In this dissertation, starting of these sensors, it is to make signal analysis through Park s vector that provides a good visualization capability. Faults data acquisition is an arduous task; in this way, it is developed a methodology for data base construction. Park s transformer is applied into stationary reference for machine modeling of the machine s differential equations solution. Faults detection needs a detailed analysis of variables and its influences that becomes the diagnosis more complex. The tasks of pattern recognition allow that systems are automatically generated, based in patterns and data concepts, in the majority cases undetectable for specialists, helping decision tasks. Classifiers algorithms with diverse learning paradigms: k-Neighborhood, Neural Networks, Decision Trees and Naïves Bayes are used to patterns recognition of machines faults. Multi-classifier systems are used to improve classification errors. It inspected the algorithms homogeneous: Bagging and Boosting and heterogeneous: Vote, Stacking and Stacking C. Results present the effectiveness of constructed model to faults modeling, such as the possibility of using multi-classifiers algorithm on faults classification
Resumo:
One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database
Resumo:
Reinforcement learning is a machine learning technique that, although finding a large number of applications, maybe is yet to reach its full potential. One of the inadequately tested possibilities is the use of reinforcement learning in combination with other methods for the solution of pattern classification problems. It is well documented in the literature the problems that support vector machine ensembles face in terms of generalization capacity. Algorithms such as Adaboost do not deal appropriately with the imbalances that arise in those situations. Several alternatives have been proposed, with varying degrees of success. This dissertation presents a new approach to building committees of support vector machines. The presented algorithm combines Adaboost algorithm with a layer of reinforcement learning to adjust committee parameters in order to avoid that imbalances on the committee components affect the generalization performance of the final hypothesis. Comparisons were made with ensembles using and not using the reinforcement learning layer, testing benchmark data sets widely known in area of pattern classification
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)