995 resultados para Wrapper approach
Resumo:
Besides optimizing classifier predictive performance and addressing the curse of the dimensionality problem, feature selection techniques support a classification model as simple as possible. In this paper, we present a wrapper feature selection approach based on Bat Algorithm (BA) and Optimum-Path Forest (OPF), in which we model the problem of feature selection as an binary-based optimization technique, guided by BA using the OPF accuracy over a validating set as the fitness function to be maximized. Moreover, we present a methodology to better estimate the quality of the reduced feature set. Experiments conducted over six public datasets demonstrated that the proposed approach provides statistically significant more compact sets and, in some cases, it can indeed improve the classification effectiveness. © 2013 Elsevier Ltd. All rights reserved.
Resumo:
In this study, a wrapper approach was applied to objectively select the most important variables related to two different anaerobic digestion imbalances, acidogenic states and foaming. This feature selection method, implemented in artificial neural networks (ANN), was performed using input and output data from a fully instrumented pilot plant (1 m 3 upflow fixed bed digester). Results for acidogenic states showed that pH, volatile fatty acids, and inflow rate were the most relevant variables. Results for foaming showed that inflow rate and total organic carbon were among the relevant variables, both of which were related to the feed loading of the digester. Because there is not a complete agreement on the causes of foaming, these results highlight the role of digester feeding patterns in the development of foaming
Resumo:
Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be in-viable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the bats behaviour, which has never been applied to this context so far. The wrapper approach combines the power of exploration of the bats together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in five public datasets have demonstrated that the proposed approach can outperform some well-known swarm-based techniques. © 2012 IEEE.
Resumo:
Feature selection aims to find the most important information to save computational efforts and data storage. We formulated this task as a combinatorial optimization problem since the exponential growth of possible solutions makes an exhaustive search infeasible. In this work, we propose a new nature-inspired feature selection technique based on bats behavior, namely, binary bat algorithm The wrapper approach combines the power of exploration of the bats together with the speed of the optimum-path forest classifier to find a better data representation. Experiments in public datasets have shown that the proposed technique can indeed improve the effectiveness of the optimum-path forest and outperform some well-known swarm-based techniques. © 2013 Copyright © 2013 Elsevier Inc. All rights reserved.
Resumo:
Feature selection aims to find the most important information from a given set of features. As this task can be seen as an optimization problem, the combinatorial growth of the possible solutions may be inviable for a exhaustive search. In this paper we propose a new nature-inspired feature selection technique based on the Charged System Search (CSS), which has never been applied to this context so far. The wrapper approach combines the power of exploration of CSS together with the speed of the Optimum-Path Forest classifier to find the set of features that maximizes the accuracy in a validating set. Experiments conducted in four public datasets have demonstrated the validity of the proposed approach can outperform some well-known swarm-based techniques. © 2013 Springer-Verlag.
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. ^ Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a twofold “custom wrapper” approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. ^ Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. ^ This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases. ^
Resumo:
Methods for accessing data on the Web have been the focus of active research over the past few years. In this thesis we propose a method for representing Web sites as data sources. We designed a Data Extractor data retrieval solution that allows us to define queries to Web sites and process resulting data sets. Data Extractor is being integrated into the MSemODB heterogeneous database management system. With its help database queries can be distributed over both local and Web data sources within MSemODB framework. Data Extractor treats Web sites as data sources, controlling query execution and data retrieval. It works as an intermediary between the applications and the sites. Data Extractor utilizes a two-fold "custom wrapper" approach for information retrieval. Wrappers for the majority of sites are easily built using a powerful and expressive scripting language, while complex cases are processed using Java-based wrappers that utilize specially designed library of data retrieval, parsing and Web access routines. In addition to wrapper development we thoroughly investigate issues associated with Web site selection, analysis and processing. Data Extractor is designed to act as a data retrieval server, as well as an embedded data retrieval solution. We also use it to create mobile agents that are shipped over the Internet to the client's computer to perform data retrieval on behalf of the user. This approach allows Data Extractor to distribute and scale well. This study confirms feasibility of building custom wrappers for Web sites. This approach provides accuracy of data retrieval, and power and flexibility in handling of complex cases.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Thesis submitted to Faculdade de Ciências e Tecnologia of the Universidade Nova de Lisboa, in partial fulfilment of the requirements for the degree of Master in Computer Science
Resumo:
Xanthomonas citri subsp. citri (X. citri) is the causative agent of the citrus canker, a disease that affects several citrus plants in Brazil and across the world. Although many studies have demonstrated the importance of genes for infection and pathogenesis in this bacterium, there are no data related to phosphate uptake and assimilation pathways. To identify the proteins that are involved in the phosphate response, we performed a proteomic analysis of X. citri extracts after growth in three culture media with different phosphate concentrations. Using mass spectrometry and bioinformatics analysis, we showed that X. citri conserved orthologous genes from Pho regulon in Escherichia coli, including the two-component system PhoR/PhoB, ATP binding cassette (ABC transporter) Pst for phosphate uptake, and the alkaline phosphatase PhoA. Analysis performed under phosphate starvation provided evidence of the relevance of the Pst system for phosphate uptake, as well as both periplasmic binding proteins, PhoX and PstS, which were formed in high abundance. The results from this study are the first evidence of the Pho regulon activation in X. citri and bring new insights for studies related to the bacterial metabolism and physiology. Biological significance Using proteomics and bioinformatics analysis we showed for the first time that the phytopathogenic bacterium X. citri conserves a set of proteins that belong to the Pho regulon, which are induced during phosphate starvation. The most relevant in terms of conservation and up-regulation were the periplasmic-binding proteins PstS and PhoX from the ABC transporter PstSBAC for phosphate, the two-component system composed by PhoR/PhoB and the alkaline phosphatase PhoA.
Resumo:
In the current study, a new approach has been developed for correcting the effect that moisture reduction after virgin olive oil (VOO) filtration exerts on the apparent increase of the secoiridoid content by using an internal standard during extraction. Firstly, two main Spanish varieties (Picual and Hojiblanca) were submitted to industrial filtration of VOOs. Afterwards, the moisture content was determined in unfiltered and filtered VOOs, and liquid-liquid extraction of phenolic compounds was performed using different internal standards. The resulting extracts were analyzed by HPLC-ESI-TOF/MS, in order to gain maximum information concerning the phenolic profiles of the samples under study. The reduction effect of filtration on the moisture content, phenolic alcohols, and flavones was confirmed at the industrial scale. Oleuropein was chosen as internal standard and, for the first time, the apparent increase of secoiridoids in filtered VOO was corrected, using a correction coefficient (Cc) calculated from the variation of internal standard area in filtered and unfiltered VOO during extraction. This approach gave the real concentration of secoiridoids in filtered VOO, and clarified the effect of the filtration step on the phenolic fraction. This finding is of great importance for future studies that seek to quantify phenolic compounds in VOOs.
Resumo:
Lipidic mixtures present a particular phase change profile highly affected by their unique crystalline structure. However, classical solid-liquid equilibrium (SLE) thermodynamic modeling approaches, which assume the solid phase to be a pure component, sometimes fail in the correct description of the phase behavior. In addition, their inability increases with the complexity of the system. To overcome some of these problems, this study describes a new procedure to depict the SLE of fatty binary mixtures presenting solid solutions, namely the Crystal-T algorithm. Considering the non-ideality of both liquid and solid phases, this algorithm is aimed at the determination of the temperature in which the first and last crystal of the mixture melts. The evaluation is focused on experimental data measured and reported in this work for systems composed of triacylglycerols and fatty alcohols. The liquidus and solidus lines of the SLE phase diagrams were described by using excess Gibbs energy based equations, and the group contribution UNIFAC model for the calculation of the activity coefficients of both liquid and solid phases. Very low deviations of theoretical and experimental data evidenced the strength of the algorithm, contributing to the enlargement of the scope of the SLE modeling.
Resumo:
To analyze the effects of treatment approach on the outcomes of newborns (birth weight [BW] < 1,000 g) with patent ductus arteriosus (PDA), from the Brazilian Neonatal Research Network (BNRN) on: death, bronchopulmonary dysplasia (BPD), severe intraventricular hemorrhage (IVH III/IV), retinopathy of prematurity requiring surgical (ROPsur), necrotizing enterocolitis requiring surgery (NECsur), and death/BPD. This was a multicentric, cohort study, retrospective data collection, including newborns (BW < 1000 g) with gestational age (GA) < 33 weeks and echocardiographic diagnosis of PDA, from 16 neonatal units of the BNRN from January 1, 2010 to Dec 31, 2011. Newborns who died or were transferred until the third day of life, and those with presence of congenital malformation or infection were excluded. Groups: G1 - conservative approach (without treatment), G2 - pharmacologic (indomethacin or ibuprofen), G3 - surgical ligation (independent of previous treatment). Factors analyzed: antenatal corticosteroid, cesarean section, BW, GA, 5 min. Apgar score < 4, male gender, Score for Neonatal Acute Physiology Perinatal Extension (SNAPPE II), respiratory distress syndrome (RDS), late sepsis (LS), mechanical ventilation (MV), surfactant (< 2 h of life), and time of MV. death, O2 dependence at 36 weeks (BPD36wks), IVH III/IV, ROPsur, NECsur, and death/BPD36wks. Student's t-test, chi-squared test, or Fisher's exact test; Odds ratio (95% CI); logistic binary regression and backward stepwise multiple regression. Software: MedCalc (Medical Calculator) software, version 12.1.4.0. p-values < 0.05 were considered statistically significant. 1,097 newborns were selected and 494 newborns were included: G1 - 187 (37.8%), G2 - 205 (41.5%), and G3 - 102 (20.6%). The highest mortality was observed in G1 (51.3%) and the lowest in G3 (14.7%). The highest frequencies of BPD36wks (70.6%) and ROPsur were observed in G3 (23.5%). The lowest occurrence of death/BPD36wks occurred in G2 (58.0%). Pharmacological (OR 0.29; 95% CI: 0.14-0.62) and conservative (OR 0.34; 95% CI: 0.14-0.79) treatments were protective for the outcome death/BPD36wks. The conservative approach of PDA was associated to high mortality, the surgical approach to the occurrence of BPD36wks and ROPsur, and the pharmacological treatment was protective for the outcome death/BPD36wks.
Resumo:
To assess quality of care of women with severe maternal morbidity and to identify associated factors. This is a national multicenter cross-sectional study performing surveillance for severe maternal morbidity, using the World Health Organization criteria. The expected number of maternal deaths was calculated with the maternal severity index (MSI) based on the severity of complication, and the standardized mortality ratio (SMR) for each center was estimated. Analyses on the adequacy of care were performed. 17 hospitals were classified as providing adequate and 10 as nonadequate care. Besides almost twofold increase in maternal mortality ratio, the main factors associated with nonadequate performance were geographic difficulty in accessing health services (P < 0.001), delays related to quality of medical care (P = 0.012), absence of blood derivatives (P = 0.013), difficulties of communication between health services (P = 0.004), and any delay during the whole process (P = 0.039). This is an example of how evaluation of the performance of health services is possible, using a benchmarking tool specific to Obstetrics. In this study the MSI was a useful tool for identifying differences in maternal mortality ratios and factors associated with nonadequate performance of care.
Resumo:
Purified genomic DNA can be difficult to obtain from some plant species because of the presence of impurities such as polysaccharides, which are often co-extracted with DNA. In this study, we developed a fast, simple, and low-cost protocol for extracting DNA from plants containing high levels of secondary metabolites. This protocol does not require the use of volatile toxic reagents such as mercaptoethanol, chloroform, or phenol and allows the extraction of high-quality DNA from wild and cultivated tropical species.