950 resultados para Cross-validation
Resumo:
Research objectives Poker and responsible gambling both entail the use of the executive functions (EF), which are higher-level cognitive abilities. The main objective of this work was to assess if online poker players of different ability show different performances in their EF and if so, which functions are the most discriminating ones. The secondary objective was to assess if the EF performance can predict the quality of gambling, according to the Gambling Related Cognition Scale (GRCS), the South Oaks Gambling Screen (SOGS) and the Problem Gambling Severity Index (PGSI). Sample and methods The study design consisted of two stages: 46 Italian active players (41m, 5f; age 32±7,1ys; education 14,8±3ys) fulfilled the PGSI in a secure IT web system and uploaded their own hand history files, which were anonymized and then evaluated by two poker experts. 36 of these players (31m, 5f; age 33±7,3ys; education 15±3ys) accepted to take part in the second stage: the administration of an extensive neuropsychological test battery by a blinded trained professional. To answer the main research question we collected all final and intermediate scores of the EF tests on each player together with the scoring on the playing ability. To answer the secondary research question, we referred to GRCS, PGSI and SOGS scores. We determined which variables that are good predictors of the playing ability score using statistical techniques able to deal with many regressors and few observations (LASSO, best subset algorithms and CART). In this context information criteria and cross-validation errors play a key role for the selection of the relevant regressors, while significance testing and goodness-of-fit measures can lead to wrong conclusions. Preliminary findings We found significant predictors of the poker ability score in various tests. In particular, there are good predictors 1) in some Wisconsin Card Sorting Test items that measure flexibility in choosing strategy of problem-solving, strategic planning, modulating impulsive responding, goal setting and self-monitoring, 2) in those Cognitive Estimates Test variables related to deductive reasoning, problem solving, development of an appropriate strategy and self-monitoring, 3) in the Emotional Quotient Inventory Short (EQ-i:S) Stress Management score, composed by the Stress Tolerance and Impulse Control scores, and in the Interpersonal score (Empathy, Social Responsibility, Interpersonal Relationship). As for the quality of gambling, some EQ-i:S scales scores provide the best predictors: General Mood for the PGSI; Intrapersonal (Self-Regard; Emotional Self-Awareness, Assertiveness, Independence, Self-Actualization) and Adaptability (Reality Testing, Flexibility, Problem Solving) for the SOGS, Adaptability for the GRCS. Implications for the field Through PokerMapper we gathered knowledge and evaluated the feasibility of the construction of short tasks/card games in online poker environments for profiling users’ executive functions. These card games will be part of an IT system able to dynamically profile EF and provide players with a feedback on their expected performance and ability to gamble responsibly in that particular moment. The implementation of such system in existing gambling platforms could lead to an effective proactive tool for supporting responsible gambling.
Resumo:
Jakarta is vulnerable to flooding mainly caused by prolonged and heavy rainfall and thus a robust hydrological modeling is called for. A good quality of spatial precipitation data is therefore desired so that a good hydrological model could be achieved. Two types of rainfall sources are available: satellite and gauge station observations. At-site rainfall is considered to be a reliable and accurate source of rainfall. However, the limited number of stations makes the spatial interpolation not very much appealing. On the other hand, the gridded rainfall nowadays has high spatial resolution and improved accuracy, but still, relatively less accurate than its counterpart. To achieve a better precipitation data set, the study proposes cokriging method, a blending algorithm, to yield the blended satellite-gauge gridded rainfall at approximately 10-km resolution. The Global Satellite Mapping of Precipitation (GSMaP, 0.1⁰×0.1⁰) and daily rainfall observations from gauge stations are used. The blended product is compared with satellite data by cross-validation method. The newly-yield blended product is then utilized to re-calibrate the hydrological model. Several scenarios are simulated by the hydrological models calibrated by gauge observations alone and blended product. The performance of two calibrated hydrological models is then assessed and compared based on simulated and observed runoff.
Resumo:
Modelos para detecção de fraude são utilizados para identificar se uma transação é legítima ou fraudulenta com base em informações cadastrais e transacionais. A técnica proposta no estudo apresentado, nesta dissertação, consiste na de Redes Bayesianas (RB); seus resultados foram comparados à técnica de Regressão Logística (RL), amplamente utilizada pelo mercado. As Redes Bayesianas avaliadas foram os classificadores bayesianos, com a estrutura Naive Bayes. As estruturas das redes bayesianas foram obtidas a partir de dados reais, fornecidos por uma instituição financeira. A base de dados foi separada em amostras de desenvolvimento e validação por cross validation com dez partições. Naive Bayes foram os classificadores escolhidos devido à simplicidade e a sua eficiência. O desempenho do modelo foi avaliado levando-se em conta a matriz de confusão e a área abaixo da curva ROC. As análises dos modelos revelaram desempenho, levemente, superior da regressão logística quando comparado aos classificadores bayesianos. A regressão logística foi escolhida como modelo mais adequado por ter apresentado melhor desempenho na previsão das operações fraudulentas, em relação à matriz de confusão. Baseada na área abaixo da curva ROC, a regressão logística demonstrou maior habilidade em discriminar as operações que estão sendo classificadas corretamente, daquelas que não estão.
Resumo:
In this study the effect of the cultivar on the volatile profile of five different banana varieties was evaluated and determined by dynamic headspace solid-phase microextraction (dHS-SPME) combined with one-dimensional gas chromatography–mass spectrometry (1D-GC–qMS). This approach allowed the definition of a volatile metabolite profile to each banana variety and can be used as pertinent criteria of differentiation. The investigated banana varieties (Dwarf Cavendish, Prata, Maçã, Ouro and Platano) have certified botanical origin and belong to the Musaceae family, the most common genomic group cultivated in Madeira Island (Portugal). The influence of dHS-SPME experimental factors, namely, fibre coating, extraction time and extraction temperature, on the equilibrium headspace analysis was investigated and optimised using univariate optimisation design. A total of 68 volatile organic metabolites (VOMs) were tentatively identified and used to profile the volatile composition in different banana cultivars, thus emphasising the sensitivity and applicability of SPME for establishment of the volatile metabolomic pattern of plant secondary metabolites. Ethyl esters were found to comprise the largest chemical class accounting 80.9%, 86.5%, 51.2%, 90.1% and 6.1% of total peak area for Dwarf Cavendish, Prata, Ouro, Maçã and Platano volatile fraction, respectively. Gas chromatographic peak areas were submitted to multivariate statistical analysis (principal component and stepwise linear discriminant analysis) in order to visualise clusters within samples and to detect the volatile metabolites able to differentiate banana cultivars. The application of the multivariate analysis on the VOMs data set resulted in predictive abilities of 90% as evaluated by the cross-validation procedure.
Resumo:
Allergic asthma represents an important public health issue, most common in the paediatric population, characterized by airway inflammation that may lead to changes in volatiles secreted via the lungs. Thus, exhaled breath has potential to be a matrix with relevant metabolomic information to characterize this disease. Progress in biochemistry, health sciences and related areas depends on instrumental advances, and a high throughput and sensitive equipment such as comprehensive two-dimensional gas chromatography–time of flight mass spectrometry (GC × GC–ToFMS) was considered. GC × GC–ToFMS application in the analysis of the exhaled breath of 32 children with allergic asthma, from which 10 had also allergic rhinitis, and 27 control children allowed the identification of several hundreds of compounds belonging to different chemical families. Multivariate analysis, using Partial Least Squares-Discriminant Analysis in tandem with Monte Carlo Cross Validation was performed to assess the predictive power and to help the interpretation of recovered compounds possibly linked to oxidative stress, inflammation processes or other cellular processes that may characterize asthma. The results suggest that the model is robust, considering the high classification rate, sensitivity, and specificity. A pattern of six compounds belonging to the alkanes characterized the asthmatic population: nonane, 2,2,4,6,6-pentamethylheptane, decane, 3,6-dimethyldecane, dodecane, and tetradecane. To explore future clinical applications, and considering the future role of molecular-based methodologies, a compound set was established to rapid access of information from exhaled breath, reducing the time of data processing, and thus, becoming more expedite method for the clinical purposes.
Resumo:
The relation between metabolic demand and maximal oxygen consumption during exercise have been investigated in different areas of knowledge. In the health field, the determination of maximal oxygen consumption (VO2max) is considered a method to classify the level of physical fitness or the risk of cardiocirculatory diseases. The accuracy to obtain data provides a better evaluation of functional responses and allows a reduction in the error margin at the moment of risk classification, as well as, at the moment of determination of aerobic exercise work load. In Brasil, the use of respirometry associated to ergometric test became an opition in the cardiorespiratory evaluation. This equipment allows predictions concerning the oxyredutase process, making it possible to identify physiological responses to physical effort as the respiratory threshold. This thesis focused in the development of mathematical models developed by multiple regression validated by the stepwise method, aiming to predict the VO2max based on respiratory responses to physical effort. The sample was composed of a ramdom sample of 181 healthy individuals, men and women, that were randomized to two groups: regression group and cross validation group (GV). The voluntiars were submitted to a incremental treadmill test; objetiving to determinate of the second respiratory threshold (LVII) and the Peak VO2max. Using the método forward addition method 11 models of VO2max prediction in trendmill were developded. No significative differences were found between the VO2max meansured and the predicted by models when they were compared using ANOVA One-Way and the Post Hoc test of Turkey. We concluded that the developed mathematical models allow a prediction of the VO2max of healthy young individuals based on the LVII
Resumo:
This work intends to analyze the behavior of the gas flow of plunger lift wells producing to well testing separators in offshore production platforms to aim a technical procedure to estimate the gas flow during the slug production period. The motivation for this work appeared from the expectation of some wells equipped with plunger lift method by PETROBRAS in Ubarana sea field located at Rio Grande do Norte State coast where the produced fluids measurement is made in well testing separators at the platform. The oil artificial lift method called plunger lift is used when the available energy of the reservoir is not high enough to overcome all the necessary load losses to lift the oil from the bottom of the well to the surface continuously. This method consists, basically, in one free piston acting as a mechanical interface between the formation gas and the produced liquids, greatly increasing the well s lifting efficiency. A pneumatic control valve is mounted at the flow line to control the cycles. When this valve opens, the plunger starts to move from the bottom to the surface of the well lifting all the oil and gas that are above it until to reach the well test separator where the fluids are measured. The well test separator is used to measure all the volumes produced by the well during a certain period of time called production test. In most cases, the separators are designed to measure stabilized flow, in other words, reasonably constant flow by the use of level and pressure electronic controllers (PLC) and by assumption of a steady pressure inside the separator. With plunger lift wells the liquid and gas flow at the surface are cyclical and unstable what causes the appearance of slugs inside the separator, mainly in the gas phase, because introduce significant errors in the measurement system (e.g.: overrange error). The flow gas analysis proposed in this work is based on two mathematical models used together: i) a plunger lift well model proposed by Baruzzi [1] with later modifications made by Bolonhini [2] to built a plunger lift simulator; ii) a two-phase separator model (gas + liquid) based from a three-phase separator model (gas + oil + water) proposed by Nunes [3]. Based on the models above and with field data collected from the well test separator of PUB-02 platform (Ubarana sea field) it was possible to demonstrate that the output gas flow of the separator can be estimate, with a reasonable precision, from the control signal of the Pressure Control Valve (PCV). Several models of the System Identification Toolbox from MATLAB® were analyzed to evaluate which one better fit to the data collected from the field. For validation of the models, it was used the AIC criterion, as well as a variant of the cross validation criterion. The ARX model performance was the best one to fit to the data and, this way, we decided to evaluate a recursive algorithm (RARX) also with real time data. The results were quite promising that indicating the viability to estimate the output gas flow rate from a plunger lift well producing to a well test separator, with the built-in information of the control signal to the PCV
Resumo:
One of the most important goals of bioinformatics is the ability to identify genes in uncharacterized DNA sequences on world wide database. Gene expression on prokaryotes initiates when the RNA-polymerase enzyme interacts with DNA regions called promoters. In these regions are located the main regulatory elements of the transcription process. Despite the improvement of in vitro techniques for molecular biology analysis, characterizing and identifying a great number of promoters on a genome is a complex task. Nevertheless, the main drawback is the absence of a large set of promoters to identify conserved patterns among the species. Hence, a in silico method to predict them on any species is a challenge. Improved promoter prediction methods can be one step towards developing more reliable ab initio gene prediction methods. In this work, we present an empirical comparison of Machine Learning (ML) techniques such as Na¨ýve Bayes, Decision Trees, Support Vector Machines and Neural Networks, Voted Perceptron, PART, k-NN and and ensemble approaches (Bagging and Boosting) to the task of predicting Bacillus subtilis. In order to do so, we first built two data set of promoter and nonpromoter sequences for B. subtilis and a hybrid one. In order to evaluate of ML methods a cross-validation procedure is applied. Good results were obtained with methods of ML like SVM and Naïve Bayes using B. subtilis. However, we have not reached good results on hybrid database
Resumo:
Nowadays, classifying proteins in structural classes, which concerns the inference of patterns in their 3D conformation, is one of the most important open problems in Molecular Biology. The main reason for this is that the function of a protein is intrinsically related to its spatial conformation. However, such conformations are very difficult to be obtained experimentally in laboratory. Thus, this problem has drawn the attention of many researchers in Bioinformatics. Considering the great difference between the number of protein sequences already known and the number of three-dimensional structures determined experimentally, the demand of automated techniques for structural classification of proteins is very high. In this context, computational tools, especially Machine Learning (ML) techniques, have become essential to deal with this problem. In this work, ML techniques are used in the recognition of protein structural classes: Decision Trees, k-Nearest Neighbor, Naive Bayes, Support Vector Machine and Neural Networks. These methods have been chosen because they represent different paradigms of learning and have been widely used in the Bioinfornmatics literature. Aiming to obtain an improvment in the performance of these techniques (individual classifiers), homogeneous (Bagging and Boosting) and heterogeneous (Voting, Stacking and StackingC) multiclassification systems are used. Moreover, since the protein database used in this work presents the problem of imbalanced classes, artificial techniques for class balance (Undersampling Random, Tomek Links, CNN, NCL and OSS) are used to minimize such a problem. In order to evaluate the ML methods, a cross-validation procedure is applied, where the accuracy of the classifiers is measured using the mean of classification error rate, on independent test sets. These means are compared, two by two, by the hypothesis test aiming to evaluate if there is, statistically, a significant difference between them. With respect to the results obtained with the individual classifiers, Support Vector Machine presented the best accuracy. In terms of the multi-classification systems (homogeneous and heterogeneous), they showed, in general, a superior or similar performance when compared to the one achieved by the individual classifiers used - especially Boosting with Decision Tree and the StackingC with Linear Regression as meta classifier. The Voting method, despite of its simplicity, has shown to be adequate for solving the problem presented in this work. The techniques for class balance, on the other hand, have not produced a significant improvement in the global classification error. Nevertheless, the use of such techniques did improve the classification error for the minority class. In this context, the NCL technique has shown to be more appropriated
Resumo:
One of the current major concerns in engineering is the development of aircrafts that have low power consumption and high performance. So, airfoils that have a high value of Lift Coefficient and a low value for the Drag Coefficient, generating a High-Efficiency airfoil are studied and designed. When the value of the Efficiency increases, the aircraft s fuel consumption decreases, thus improving its performance. Therefore, this work aims to develop a tool for designing of airfoils from desired characteristics, as Lift and Drag coefficients and the maximum Efficiency, using an algorithm based on an Artificial Neural Network (ANN). For this, it was initially collected an aerodynamic characteristics database, with a total of 300 airfoils, from the software XFoil. Then, through the software MATLAB, several network architectures were trained, between modular and hierarchical, using the Back-propagation algorithm and the Momentum rule. For data analysis, was used the technique of cross- validation, evaluating the network that has the lowest value of Root Mean Square (RMS). In this case, the best result was obtained for a hierarchical architecture with two modules and one layer of hidden neurons. The airfoils developed for that network, in the regions of lower RMS, were compared with the same airfoils imported into the software XFoil
Resumo:
Expanded Bed Adsorption (EBA) is an integrative process that combines concepts of chromatography and fluidization of solids. The many parameters involved and their synergistic effects complicate the optimization of the process. Fortunately, some mathematical tools have been developed in order to guide the investigation of the EBA system. In this work the application of experimental design, phenomenological modeling and artificial neural networks (ANN) in understanding chitosanases adsorption on ion exchange resin Streamline® DEAE have been investigated. The strain Paenibacillus ehimensis NRRL B-23118 was used for chitosanase production. EBA experiments were carried out using a column of 2.6 cm inner diameter with 30.0 cm in height that was coupled to a peristaltic pump. At the bottom of the column there was a distributor of glass beads having a height of 3.0 cm. Assays for residence time distribution (RTD) revelead a high degree of mixing, however, the Richardson-Zaki coefficients showed that the column was on the threshold of stability. Isotherm models fitted the adsorption equilibrium data in the presence of lyotropic salts. The results of experiment design indicated that the ionic strength and superficial velocity are important to the recovery and purity of chitosanases. The molecular mass of the two chitosanases were approximately 23 kDa and 52 kDa as estimated by SDS-PAGE. The phenomenological modeling was aimed to describe the operations in batch and column chromatography. The simulations were performed in Microsoft Visual Studio. The kinetic rate constant model set to kinetic curves efficiently under conditions of initial enzyme activity 0.232, 0.142 e 0.079 UA/mL. The simulated breakthrough curves showed some differences with experimental data, especially regarding the slope. Sensitivity tests of the model on the surface velocity, axial dispersion and initial concentration showed agreement with the literature. The neural network was constructed in MATLAB and Neural Network Toolbox. The cross-validation was used to improve the ability of generalization. The parameters of ANN were improved to obtain the settings 6-6 (enzyme activity) and 9-6 (total protein), as well as tansig transfer function and Levenberg-Marquardt training algorithm. The neural Carlos Eduardo de Araújo Padilha dezembro/2013 9 networks simulations, including all the steps of cycle, showed good agreement with experimental data, with a correlation coefficient of approximately 0.974. The effects of input variables on profiles of the stages of loading, washing and elution were consistent with the literature
Resumo:
This work is combined with the potential of the technique of near infrared spectroscopy - NIR and chemometrics order to determine the content of diclofenac tablets, without destruction of the sample, to which was used as the reference method, ultraviolet spectroscopy, which is one of the official methods. In the construction of multivariate calibration models has been studied several types of pre-processing of NIR spectral data, such as scatter correction, first derivative. The regression method used in the construction of calibration models is the PLS (partial least squares) using NIR spectroscopic data of a set of 90 tablets were divided into two sets (calibration and prediction). 54 were used in the calibration samples and the prediction was used 36, since the calibration method used was crossvalidation method (full cross-validation) that eliminates the need for a validation set. The evaluation of the models was done by observing the values of correlation coefficient R 2 and RMSEC mean square error (calibration error) and RMSEP (forecast error). As the forecast values estimated for the remaining 36 samples, which the results were consistent with the values obtained by UV spectroscopy
Resumo:
OBJETIVO: Realizar a adaptação transcultural da versão em português do Inventário de Burnout de Maslach para estudantes e investigar sua confiabilidade, validade e invariância transcultural. MÉTODOS: A validação de face envolveu participação de equipe multidisciplinar. Foi realizada validação de conteúdo. A versão em português foi preenchida em 2009, pela internet, por 958 estudantes universitários brasileiros e 556 portugueses da zona urbana. Realizou-se análise fatorial confirmatória utilizando-se como índices de ajustamento o χ²/df, o comparative fit index (CFI), goodness of fit index (GFI) e o root mean square error of approximation (RMSEA). Para verificação da estabilidade da solução fatorial conforme a versão original em inglês, realizou-se validação cruzada em 2/3 da amostra total e replicada no 1/3 restante. A validade convergente foi estimada pela variância extraída média e confiabilidade composta. Avaliou-se a validade discriminante e a consistência interna foi estimada pelo coeficiente alfa de Cronbach. A validade concorrente foi estimada por análise correlacional da versão em português e dos escores médios do Inventário de Burnout de Copenhague; a divergente foi comparada à Escala de Depressão de Beck. Foi avaliada a invariância do modelo entre a amostra brasileira e a portuguesa. RESULTADOS: O modelo trifatorial de Exaustão, Descrença e Eficácia apresentou ajustamento adequado (χ²/df = 8,498; CFI = 0,916; GFI = 0,902; RMSEA = 0,086). A estrutura fatorial foi estável (λ: χ²dif = 11,383, p = 0,50; Cov: χ²dif = 6,479, p = 0,372; Resíduos: χ²dif = 21,514, p = 0,121). Observou-se adequada validade convergente (VEM = 0,45;0,64, CC = 0,82;0,88), discriminante (ρ² = 0,06;0,33) e consistência interna (α = 0,83;0,88). A validade concorrente da versão em português com o Inventário de Copenhague foi adequada (r = 0,21;0,74). A avaliação da validade divergente do instrumento foi prejudicada pela aproximação do conceito teórico das dimensões Exaustão e Descrença da versão em português com a Escala de Beck. Não se observou invariância do instrumento entre as amostras brasileiras e portuguesas (λ:χ²dif = 84,768, p < 0,001; Cov: χ²dif = 129,206, p < 0,001; Resíduos: χ²dif = 518,760, p < 0,001). CONCLUSÕES: A versão em português do Inventário de Burnout de Maslach para estudantes apresentou adequada confiabilidade e validade, mas sua estrutura fatorial não foi invariante entre os países, apontando ausência de estabilidade transcultural.
Resumo:
Objective: To identify potential prognostic factors for pulmonary thromboembolism (PTE), establishing a mathematical model to predict the risk for fatal PTE and nonfatal PTE.Method: the reports on 4,813 consecutive autopsies performed from 1979 to 1998 in a Brazilian tertiary referral medical school were reviewed for a retrospective study. From the medical records and autopsy reports of the 512 patients found with macroscopically and/or microscopically,documented PTE, data on demographics, underlying diseases, and probable PTE site of origin were gathered and studied by multiple logistic regression. Thereafter, the jackknife method, a statistical cross-validation technique that uses the original study patients to validate a clinical prediction rule, was performed.Results: the autopsy rate was 50.2%, and PTE prevalence was 10.6%. In 212 cases, PTE was the main cause of death (fatal PTE). The independent variables selected by the regression significance criteria that were more likely to be associated with fatal PTE were age (odds ratio [OR], 1.02; 95% confidence interval [CI], 1.00 to 1.03), trauma (OR, 8.5; 95% CI, 2.20 to 32.81), right-sided cardiac thrombi (OR, 1.96; 95% CI, 1.02 to 3.77), pelvic vein thrombi (OR, 3.46; 95% CI, 1.19 to 10.05); those most likely to be associated with nonfatal PTE were systemic arterial hypertension (OR, 0.51; 95% CI, 0.33 to 0.80), pneumonia (OR, 0.46; 95% CI, 0.30 to 0.71), and sepsis (OR, 0.16; 95% CI, 0.06 to 0.40). The results obtained from the application of the equation in the 512 cases studied using logistic regression analysis suggest the range in which logit p > 0.336 favors the occurrence of fatal PTE, logit p < - 1.142 favors nonfatal PTE, and logit P with intermediate values is not conclusive. The cross-validation prediction misclassification rate was 25.6%, meaning that the prediction equation correctly classified the majority of the cases (74.4%).Conclusions: Although the usefulness of this method in everyday medical practice needs to be confirmed by a prospective study, for the time being our results suggest that concerning prevention, diagnosis, and treatment of PTE, strict attention should be given to those patients presenting the variables that are significant in the logistic regression model.
Resumo:
Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)