972 resultados para Budget function classification
Resumo:
The social media classification problems draw more and more attention in the past few years. With the rapid development of Internet and the popularity of computers, there is astronomical amount of information in the social network (social media platforms). The datasets are generally large scale and are often corrupted by noise. The presence of noise in training set has strong impact on the performance of supervised learning (classification) techniques. A budget-driven One-class SVM approach is presented in this thesis that is suitable for large scale social media data classification. Our approach is based on an existing online One-class SVM learning algorithm, referred as STOCS (Self-Tuning One-Class SVM) algorithm. To justify our choice, we first analyze the noise-resilient ability of STOCS using synthetic data. The experiments suggest that STOCS is more robust against label noise than several other existing approaches. Next, to handle big data classification problem for social media data, we introduce several budget driven features, which allow the algorithm to be trained within limited time and under limited memory requirement. Besides, the resulting algorithm can be easily adapted to changes in dynamic data with minimal computational cost. Compared with two state-of-the-art approaches, Lib-Linear and kNN, our approach is shown to be competitive with lower requirements of memory and time.
Resumo:
This paper describes a new food classification which assigns foodstuffs according to the extent and purpose of the industrial processing applied to them. Three main groups are defined: unprocessed or minimally processed foods (group 1), processed culinary and food industry ingredients (group 2), and ultra-processed food products (group 3). The use of this classification is illustrated by applying it to data collected in the Brazilian Household Budget Survey which was conducted in 2002/2003 through a probabilistic sample of 48,470 Brazilian households. The average daily food availability was 1,792 kcal/person being 42.5% from group 1 (mostly rice and beans and meat and milk), 37.5% from group 2 (mostly vegetable oils, sugar, and flours), and 20% from group 3 (mostly breads, biscuits, sweets, soft drinks, and sausages). The share of group 3 foods increased with income, and represented almost one third of all calories in higher income households. The impact of the replacement of group 1 foods and group 2 ingredients by group 3 products on the overall quality of the diet, eating patterns and health is discussed.
Resumo:
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming from two families of the Pfam database are significantly different. We model protein sequences as realizations of Variable Length Markov Chains (VLMC) and we use the context trees as a signature of each protein family. Our approach is based on a Kolmogorov-Smirnov-type goodness-of-fit test proposed by Balding et at. [Limit theorems for sequences of random trees (2008), DOI: 10.1007/s11749-008-0092-z]. The test statistic is a supremum over the space of trees of a function of the two samples; its computation grows, in principle, exponentially fast with the maximal number of nodes of the potential trees. We show how to transform this problem into a max-flow over a related graph which can be solved using a Ford-Fulkerson algorithm in polynomial time on that number. We apply the test to 10 randomly chosen protein domain families from the seed of Pfam-A database (high quality, manually curated families). The test shows that the distributions of context trees coming from different families are significantly different. We emphasize that this is a novel mathematical approach to validate the automatic clustering of sequences in any context. We also study the performance of the test via simulations on Galton-Watson related processes.
Resumo:
Objective: We carry out a systematic assessment on a suite of kernel-based learning machines while coping with the task of epilepsy diagnosis through automatic electroencephalogram (EEG) signal classification. Methods and materials: The kernel machines investigated include the standard support vector machine (SVM), the least squares SVM, the Lagrangian SVM, the smooth SVM, the proximal SVM, and the relevance vector machine. An extensive series of experiments was conducted on publicly available data, whose clinical EEG recordings were obtained from five normal subjects and five epileptic patients. The performance levels delivered by the different kernel machines are contrasted in terms of the criteria of predictive accuracy, sensitivity to the kernel function/parameter value, and sensitivity to the type of features extracted from the signal. For this purpose, 26 values for the kernel parameter (radius) of two well-known kernel functions (namely. Gaussian and exponential radial basis functions) were considered as well as 21 types of features extracted from the EEG signal, including statistical values derived from the discrete wavelet transform, Lyapunov exponents, and combinations thereof. Results: We first quantitatively assess the impact of the choice of the wavelet basis on the quality of the features extracted. Four wavelet basis functions were considered in this study. Then, we provide the average accuracy (i.e., cross-validation error) values delivered by 252 kernel machine configurations; in particular, 40%/35% of the best-calibrated models of the standard and least squares SVMs reached 100% accuracy rate for the two kernel functions considered. Moreover, we show the sensitivity profiles exhibited by a large sample of the configurations whereby one can visually inspect their levels of sensitiveness to the type of feature and to the kernel function/parameter value. Conclusions: Overall, the results evidence that all kernel machines are competitive in terms of accuracy, with the standard and least squares SVMs prevailing more consistently. Moreover, the choice of the kernel function and parameter value as well as the choice of the feature extractor are critical decisions to be taken, albeit the choice of the wavelet family seems not to be so relevant. Also, the statistical values calculated over the Lyapunov exponents were good sources of signal representation, but not as informative as their wavelet counterparts. Finally, a typical sensitivity profile has emerged among all types of machines, involving some regions of stability separated by zones of sharp variation, with some kernel parameter values clearly associated with better accuracy rates (zones of optimality). (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
Tetanus still remains a significant health problem in developing countries; it is a serious disease with a high mortality rate. The purpose of this study was to characterize the oral sensorimotor function for feeding in patients with tetanus. Thirteen patients clinically diagnosed with tetanus and admitted to an intensive care unit between December of 2005 and May of 2007 underwent a screening too) for dysphagia, involving the assessment of clinical features and 2 swallowing tests. Results indicate that the oral sensorimotor function for feeding in these patients is severely compromised, with the exception for the clinical feature of palate elevation and performance in the saliva swallowing test. The factor analysis indicated that the evaluation of tongue movement change in the oromotor examination is important in predicting alterations of cough/voice in the water swallowing test, thus suggesting that oral feeding might be unsafe. When looking at developing countries, the prolonged intensive medical and nursing care required by many patients with tetanus places extra demands on an already stretched healthcare budget. Intervention by a speech pathologist could mean that time in the ICU would be reduced as well as the number of re-admissions due to complications. (C) 2009 Elsevier B.V. All rights reserved.
Resumo:
Background: Endoscopic sclerotherapy (ES) has been the standard treatment for children with idiopathic extrahepatic portal vein obstruction (EHPVO). Portosystemic shunts are indicated when variceal bleeding cannot be controlled by ES. Recently, mesenteric left portal vein bypass was indicated as a surgical intervention and preventative measure for hepatic dysfunction in children with long-term EHPVO. Nevertheless, there is a lack Of published data confirming the extent of hepatic dysfunction, hypersplenism, and physical development in children with long-term follow-up. Method: We retrospectively verified the long-term outcomes in 82 children with EHPVO treated with ES protocol, focusing on mortality, control of bleeding, hypersplenism, and consequent hepatic dysfunction. Results: Of the children, 56% were free from bleeding after the initiation of ES. The most frequent cause of rebleeding was gastric varices (30%). Four patients had recurrent bleeding from esophageal varices (4.6%). Four patients underwent surgery as a consequence of uncontrolled gastric varices. There were no deaths. Most patients showed good physical development. We observed a mild but statistically significant drop in factor V motion, as well as leukocyte and platelet count. Conclusion: Endoscopic sclerotherapy is an efficient treatment for children with EHPVO. The incidence of rebleeding is low, and there was no mortality. Children develop mild liver dysfunction and hypersplenism with long-term follow-up. Only a few patients manifest symptoms of hypersplenism, portal biliopathy, or liver dysfunction before adolescence. (C) 2009 Elsevier Inc. All rights reserved.
Resumo:
This Thesis describes the application of automatic learning methods for a) the classification of organic and metabolic reactions, and b) the mapping of Potential Energy Surfaces(PES). The classification of reactions was approached with two distinct methodologies: a representation of chemical reactions based on NMR data, and a representation of chemical reactions from the reaction equation based on the physico-chemical and topological features of chemical bonds. NMR-based classification of photochemical and enzymatic reactions. Photochemical and metabolic reactions were classified by Kohonen Self-Organizing Maps (Kohonen SOMs) and Random Forests (RFs) taking as input the difference between the 1H NMR spectra of the products and the reactants. The development of such a representation can be applied in automatic analysis of changes in the 1H NMR spectrum of a mixture and their interpretation in terms of the chemical reactions taking place. Examples of possible applications are the monitoring of reaction processes, evaluation of the stability of chemicals, or even the interpretation of metabonomic data. A Kohonen SOM trained with a data set of metabolic reactions catalysed by transferases was able to correctly classify 75% of an independent test set in terms of the EC number subclass. Random Forests improved the correct predictions to 79%. With photochemical reactions classified into 7 groups, an independent test set was classified with 86-93% accuracy. The data set of photochemical reactions was also used to simulate mixtures with two reactions occurring simultaneously. Kohonen SOMs and Feed-Forward Neural Networks (FFNNs) were trained to classify the reactions occurring in a mixture based on the 1H NMR spectra of the products and reactants. Kohonen SOMs allowed the correct assignment of 53-63% of the mixtures (in a test set). Counter-Propagation Neural Networks (CPNNs) gave origin to similar results. The use of supervised learning techniques allowed an improvement in the results. They were improved to 77% of correct assignments when an ensemble of ten FFNNs were used and to 80% when Random Forests were used. This study was performed with NMR data simulated from the molecular structure by the SPINUS program. In the design of one test set, simulated data was combined with experimental data. The results support the proposal of linking databases of chemical reactions to experimental or simulated NMR data for automatic classification of reactions and mixtures of reactions. Genome-scale classification of enzymatic reactions from their reaction equation. The MOLMAP descriptor relies on a Kohonen SOM that defines types of bonds on the basis of their physico-chemical and topological properties. The MOLMAP descriptor of a molecule represents the types of bonds available in that molecule. The MOLMAP descriptor of a reaction is defined as the difference between the MOLMAPs of the products and the reactants, and numerically encodes the pattern of bonds that are broken, changed, and made during a chemical reaction. The automatic perception of chemical similarities between metabolic reactions is required for a variety of applications ranging from the computer validation of classification systems, genome-scale reconstruction (or comparison) of metabolic pathways, to the classification of enzymatic mechanisms. Catalytic functions of proteins are generally described by the EC numbers that are simultaneously employed as identifiers of reactions, enzymes, and enzyme genes, thus linking metabolic and genomic information. Different methods should be available to automatically compare metabolic reactions and for the automatic assignment of EC numbers to reactions still not officially classified. In this study, the genome-scale data set of enzymatic reactions available in the KEGG database was encoded by the MOLMAP descriptors, and was submitted to Kohonen SOMs to compare the resulting map with the official EC number classification, to explore the possibility of predicting EC numbers from the reaction equation, and to assess the internal consistency of the EC classification at the class level. A general agreement with the EC classification was observed, i.e. a relationship between the similarity of MOLMAPs and the similarity of EC numbers. At the same time, MOLMAPs were able to discriminate between EC sub-subclasses. EC numbers could be assigned at the class, subclass, and sub-subclass levels with accuracies up to 92%, 80%, and 70% for independent test sets. The correspondence between chemical similarity of metabolic reactions and their MOLMAP descriptors was applied to the identification of a number of reactions mapped into the same neuron but belonging to different EC classes, which demonstrated the ability of the MOLMAP/SOM approach to verify the internal consistency of classifications in databases of metabolic reactions. RFs were also used to assign the four levels of the EC hierarchy from the reaction equation. EC numbers were correctly assigned in 95%, 90%, 85% and 86% of the cases (for independent test sets) at the class, subclass, sub-subclass and full EC number level,respectively. Experiments for the classification of reactions from the main reactants and products were performed with RFs - EC numbers were assigned at the class, subclass and sub-subclass level with accuracies of 78%, 74% and 63%, respectively. In the course of the experiments with metabolic reactions we suggested that the MOLMAP / SOM concept could be extended to the representation of other levels of metabolic information such as metabolic pathways. Following the MOLMAP idea, the pattern of neurons activated by the reactions of a metabolic pathway is a representation of the reactions involved in that pathway - a descriptor of the metabolic pathway. This reasoning enabled the comparison of different pathways, the automatic classification of pathways, and a classification of organisms based on their biochemical machinery. The three levels of classification (from bonds to metabolic pathways) allowed to map and perceive chemical similarities between metabolic pathways even for pathways of different types of metabolism and pathways that do not share similarities in terms of EC numbers. Mapping of PES by neural networks (NNs). In a first series of experiments, ensembles of Feed-Forward NNs (EnsFFNNs) and Associative Neural Networks (ASNNs) were trained to reproduce PES represented by the Lennard-Jones (LJ) analytical potential function. The accuracy of the method was assessed by comparing the results of molecular dynamics simulations (thermal, structural, and dynamic properties) obtained from the NNs-PES and from the LJ function. The results indicated that for LJ-type potentials, NNs can be trained to generate accurate PES to be used in molecular simulations. EnsFFNNs and ASNNs gave better results than single FFNNs. A remarkable ability of the NNs models to interpolate between distant curves and accurately reproduce potentials to be used in molecular simulations is shown. The purpose of the first study was to systematically analyse the accuracy of different NNs. Our main motivation, however, is reflected in the next study: the mapping of multidimensional PES by NNs to simulate, by Molecular Dynamics or Monte Carlo, the adsorption and self-assembly of solvated organic molecules on noble-metal electrodes. Indeed, for such complex and heterogeneous systems the development of suitable analytical functions that fit quantum mechanical interaction energies is a non-trivial or even impossible task. The data consisted of energy values, from Density Functional Theory (DFT) calculations, at different distances, for several molecular orientations and three electrode adsorption sites. The results indicate that NNs require a data set large enough to cover well the diversity of possible interaction sites, distances, and orientations. NNs trained with such data sets can perform equally well or even better than analytical functions. Therefore, they can be used in molecular simulations, particularly for the ethanol/Au (111) interface which is the case studied in the present Thesis. Once properly trained, the networks are able to produce, as output, any required number of energy points for accurate interpolations.
Resumo:
Our purposes are to determine the impact of histological factors observed in zero-time biopsies on early post transplant kidney allograft function. We specifically want to compare the semi-quantitative Banff Classification of zero time biopsies with quantification of % cortical area fibrosis. Sixty three zero-time deceased donor allograft biopsies were retrospectively semiquantitatively scored using Banff classification. By adding the individual chronic parameters a Banff Chronic Sum (BCS) Score was generated. Percentage of cortical area Picro Sirius Red (%PSR) staining was assessed and calculated with a computer program. A negative linear regression between %PSR/ GFR at 3 year post-transplantation was established (Y=62.08 +-4.6412X; p=0.022). A significant negative correlation between arteriolar hyalinosis (rho=-0.375; p=0.005), chronic interstitial (rho=0.296; p=0.02) , chronic tubular ( rho=0.276; p=0.04) , chronic vascular (rho= -0.360;P=0.007), BCS (rho=-0.413; p=0.002) and GFR at 3 years were found. However, no correlation was found between % PSR, Ci, Ct or BCS. In multivariate linear regression the negative predictive factors of 3 years GFR were: BCS in histological model; donor kidney age, recipient age and black race in clinical model. The BCS seems a good and easy to perform tool, available to every pathologist, with significant predictive short-term value. The %PSR predicts short term kidney function in univariate study and involves extra-routine and expensive-time work. We think that %PSR must be regarded as a research instrument.
Resumo:
Acute renal failure (ARF) is common after orthotopic liver transplantation (OLT). The aim of this study was to evaluate the prognostic value of RIFLE classification in the development of CKD, hemodialysis requirement, and mortality. Patients were categorized as risk (R), injury (I) or failure (F) according to renal function at day 1, 7 and 21. Final renal function was classified according to K/DIGO guidelines. We studied 708 OLT recipients, transplanted between September 1992 and March 2007; mean age 44 +/- 12.6 yr, mean follow-up 3.6 yr (28.8% > or = 5 yr). Renal dysfunction before OLT was known in 21.6%. According to the RIFLE classification, ARF occurred in 33.2%: 16.8% were R class, 8.5% I class and 7.9% F class. CKD developed in 45.6%, with stages 4 or 5d in 11.3%. Mortality for R, I and F classes were, respectively, 10.9%, 13.3% and 39.3%. Severity of ARF correlated with development of CKD: stage 3 was associated with all classes of ARF, stages 4 and 5d only with severe ARF. Hemodialysis requirement (23%) and mortality were only correlated with the most severe form of ARF (F class). In conclusion, RIFLE classification is a useful tool to stratify the severity of early ARF providing a prognostic indicator for the risk of CKD occurrence and death.
Resumo:
Dissertation presented to obtain the Ph.D degree in Biochemistry
Resumo:
Objective: To evaluate the impact that the distribution of emphysema has on clinical and functional severity in patients with COPD. Methods: The distribution of the emphysema was analyzed in COPD patients, who were classified according to a 5-point visual classification system of lung CT findings. We assessed the influence of emphysema distribution type on the clinical and functional presentation of COPD. We also evaluated hypoxemia after the six-minute walk test (6MWT) and determined the six-minute walk distance (6MWD). Results: Eighty-six patients were included. The mean age was 65.2 ± 12.2 years, 91.9% were male, and all but one were smokers (mean smoking history, 62.7 ± 38.4 pack-years). The emphysema distribution was categorized as obviously upper lung-predominant (type 1), in 36.0% of the patients; slightly upper lung-predominant (type 2), in 25.6%; homogeneous between the upper and lower lung (type 3), in 16.3%; and slightly lower lung-predominant (type 4), in 22.1%. Type 2 emphysema distribution was associated with lower FEV1 , FVC, FEV1 /FVC ratio, and DLCO. In comparison with the type 1 patients, the type 4 patients were more likely to have an FEV1 < 65% of the predicted value (OR = 6.91, 95% CI: 1.43-33.45; p = 0.016), a 6MWD < 350 m (OR = 6.36, 95% CI: 1.26-32.18; p = 0.025), and post-6MWT hypoxemia (OR = 32.66, 95% CI: 3.26-326.84; p = 0.003). The type 3 patients had a higher RV/TLC ratio, although the difference was not significant. Conclusions: The severity of COPD appears to be greater in type 4 patients, and type 3 patients tend to have greater hyperinflation. The distribution of emphysema could have a major impact on functional parameters and should be considered in the evaluation of COPD patients.
Resumo:
We determine he optimal combination of a universal benefit, B, and categorical benefit, C, for an economy in which individuals differ in both their ability to work - modelled as an exogenous zero quantity constraint on labour supply - and, conditional on being able to work, their productivity at work. C is targeted at those unable to work, and is conditioned in two dimensions: ex-ante an individual must be unable to work and be awarded the benefit, whilst ex-post a recipient must not subsequently work. However, the ex-ante conditionality may be imperfectly enforced due to Type I (false rejection) and Type II (false award) classification errors, whilst, in addition, the ex-post conditionality may be imperfectly enforced. If there are no classification errors - and thus no enforcement issues - it is always optimal to set C>0, whilst B=0 only if the benefit budget is sufficiently small. However, when classification errors occur, B=0 only if there are no Type I errors and the benefit budget is sufficiently small, while the conditions under which C>0 depend on the enforcement of the ex-post conditionality. We consider two discrete alternatives. Under No Enforcement C>0 only if the test administering C has some discriminatory power. In addition, social welfare is decreasing in the propensity to make each type error. However, under Full Enforcement C>0 for all levels of discriminatory power. Furthermore, whilst social welfare is decreasing in the propensity to make Type I errors, there are certain conditions under which it is increasing in the propensity to make Type II errors. This implies that there may be conditions under which it would be welfare enhancing to lower the chosen eligibility threshold - support the suggestion by Goodin (1985) to "err on the side of kindness".
Resumo:
Defining an efficient training set is one of the most delicate phases for the success of remote sensing image classification routines. The complexity of the problem, the limited temporal and financial resources, as well as the high intraclass variance can make an algorithm fail if it is trained with a suboptimal dataset. Active learning aims at building efficient training sets by iteratively improving the model performance through sampling. A user-defined heuristic ranks the unlabeled pixels according to a function of the uncertainty of their class membership and then the user is asked to provide labels for the most uncertain pixels. This paper reviews and tests the main families of active learning algorithms: committee, large margin, and posterior probability-based. For each of them, the most recent advances in the remote sensing community are discussed and some heuristics are detailed and tested. Several challenging remote sensing scenarios are considered, including very high spatial resolution and hyperspectral image classification. Finally, guidelines for choosing the good architecture are provided for new and/or unexperienced user.
Resumo:
This study presents a classification criteria for two-class Cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland, law enforcement authorities regularly ask laboratories to determine cannabis plant's chemotype from seized material in order to ascertain that the plantation is legal or not. In this study, the classification analysis is based on data obtained from the relative proportion of three major leaf compounds measured by gas-chromatography interfaced with mass spectrometry (GC-MS). The aim is to discriminate between drug type (illegal) and fiber type (legal) cannabis at an early stage of the growth. A Bayesian procedure is proposed: a Bayes factor is computed and classification is performed on the basis of the decision maker specifications (i.e. prior probability distributions on cannabis type and consequences of classification measured by losses). Classification rates are computed with two statistical models and results are compared. Sensitivity analysis is then performed to analyze the robustness of classification criteria.
Resumo:
In the recent years, kernel methods have revealed very powerful tools in many application domains in general and in remote sensing image classification in particular. The special characteristics of remote sensing images (high dimension, few labeled samples and different noise sources) are efficiently dealt with kernel machines. In this paper, we propose the use of structured output learning to improve remote sensing image classification based on kernels. Structured output learning is concerned with the design of machine learning algorithms that not only implement input-output mapping, but also take into account the relations between output labels, thus generalizing unstructured kernel methods. We analyze the framework and introduce it to the remote sensing community. Output similarity is here encoded into SVM classifiers by modifying the model loss function and the kernel function either independently or jointly. Experiments on a very high resolution (VHR) image classification problem shows promising results and opens a wide field of research with structured output kernel methods.