447 resultados para classifiers
Resumo:
Miniature diffusion size classifiers (miniDiSC) are novel handheld devices to measure ultrafine particles (UFP). UFP have been linked to the development of cardiovascular and pulmonary diseases; thus, detection and quantification of these particles are important for evaluating their potential health hazards. As part of the UFP exposure assessments of highwaymaintenance workers in western Switzerland, we compared a miniDiSC with a portable condensation particle counter (P-TRAK). In addition, we performed stationary measurements with a miniDiSC and a scanning mobility particle sizer (SMPS) at a site immediately adjacent to a highway. Measurements with miniDiSC and P-TRAK correlated well (correlation of r = 0.84) but average particle numbers of the miniDiSC were 30%âeuro"60% higher. This difference was significantly increased for mean particle diameters below 40 nm. The correlation between theminiDiSC and the SMPSduring stationary measurements was very high (r = 0.98) although particle numbers from the miniDiSC were 30% lower. Differences between the three devices were attributed to the different cutoff diameters for detection. Correction for this size dependent effect led to very similar results across all counters.We did not observe any significant influence of other particle characteristics. Our results suggest that the miniDiSC provides accurate particle number concentrations and geometric mean diameters at traffic-influenced sites, making it a useful tool for personal exposure assessment in such settings.
Resumo:
Many classifiers achieve high levels of accuracy but have limited applicability in real world situations because they do not lead to a greater understanding or insight into the^way features influence the classification. In areas such as health informatics a classifier that clearly identifies the influences on classification can be used to direct research and formulate interventions. This research investigates the practical applications of Automated Weighted Sum, (AWSum), a classifier that provides accuracy comparable to other techniques whilst providing insight into the data. This is achieved by calculating a weight for each feature value that represents its influence on the class value. The merits of this approach in classification and insight are evaluated on a Cystic Fibrosis and Diabetes datasets with positive results.
Resumo:
Genetically homogenous C57Bl/6 mice display differential metabolic adaptation when fed a high fat diet for 9 months. Most become obese and diabetic, but a significant fraction remains lean and diabetic or lean and non-diabetic. Here, we performed microarray analysis of "metabolic" transcripts expressed in liver and hindlimb muscles to evaluate: (i) whether expressed transcript patterns could indicate changes in metabolic pathways associated with the different phenotypes, (ii) how these changes differed from the early metabolic adaptation to short term high fat feeding, and (iii) whether gene classifiers could be established that were characteristic of each metabolic phenotype. Our data indicate that obesity/diabetes was associated with preserved hepatic lipogenic gene expression and increased plasma levels of very low density lipoprotein and, in muscle, with an increase in lipoprotein lipase gene expression. This suggests increased muscle fatty acid uptake, which may favor insulin resistance. In contrast, the lean mice showed a strong reduction in the expression of hepatic lipogenic genes, in particular of Scd-1, a gene linked to sensitivity to diet-induced obesity; the lean and non-diabetic mice presented an additional increased expression of eNos in liver. After 1 week of high fat feeding the liver gene expression pattern was distinct from that seen at 9 months in any of the three mouse groups, thus indicating progressive establishment of the different phenotypes. Strikingly, development of the obese phenotype involved re-expression of Scd-1 and other lipogenic genes. Finally, gene classifiers could be established that were characteristic of each metabolic phenotype. Together, these data suggest that epigenetic mechanisms influence gene expression patterns and metabolic fates.
Resumo:
Counterfeit pharmaceutical products have become a widespread problem in the last decade. Various analytical techniques have been applied to discriminate between genuine and counterfeit products. Among these, Near-infrared (NIR) and Raman spectroscopy provided promising results.The present study offers a methodology allowing to provide more valuable information fororganisations engaged in the fight against counterfeiting of medicines.A database was established by analyzing counterfeits of a particular pharmaceutical product using Near-infrared (NIR) and Raman spectroscopy. Unsupervised chemometric techniques (i.e. principal component analysis - PCA and hierarchical cluster analysis - HCA) were implemented to identify the classes within the datasets. Gas Chromatography coupled to Mass Spectrometry (GC-MS) and Fourier Transform Infrared Spectroscopy (FT-IR) were used to determine the number of different chemical profiles within the counterfeits. A comparison with the classes established by NIR and Raman spectroscopy allowed to evaluate the discriminating power provided by these techniques. Supervised classifiers (i.e. k-Nearest Neighbors, Partial Least Squares Discriminant Analysis, Probabilistic Neural Networks and Counterpropagation Artificial Neural Networks) were applied on the acquired NIR and Raman spectra and the results were compared to the ones provided by the unsupervised classifiers.The retained strategy for routine applications, founded on the classes identified by NIR and Raman spectroscopy, uses a classification algorithm based on distance measures and Receiver Operating Characteristics (ROC) curves. The model is able to compare the spectrum of a new counterfeit with that of previously analyzed products and to determine if a new specimen belongs to one of the existing classes, consequently allowing to establish a link with other counterfeits of the database.
Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.
Resumo:
BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data.
Resumo:
This article presents an experimental study about the classification ability of several classifiers for multi-classclassification of cannabis seedlings. As the cultivation of drug type cannabis is forbidden in Switzerland lawenforcement authorities regularly ask forensic laboratories to determinate the chemotype of a seized cannabisplant and then to conclude if the plantation is legal or not. This classification is mainly performed when theplant is mature as required by the EU official protocol and then the classification of cannabis seedlings is a timeconsuming and costly procedure. A previous study made by the authors has investigated this problematic [1]and showed that it is possible to differentiate between drug type (illegal) and fibre type (legal) cannabis at anearly stage of growth using gas chromatography interfaced with mass spectrometry (GC-MS) based on therelative proportions of eight major leaf compounds. The aims of the present work are on one hand to continueformer work and to optimize the methodology for the discrimination of drug- and fibre type cannabisdeveloped in the previous study and on the other hand to investigate the possibility to predict illegal cannabisvarieties. Seven classifiers for differentiating between cannabis seedlings are evaluated in this paper, namelyLinear Discriminant Analysis (LDA), Partial Least Squares Discriminant Analysis (PLS-DA), Nearest NeighbourClassification (NNC), Learning Vector Quantization (LVQ), Radial Basis Function Support Vector Machines(RBF SVMs), Random Forest (RF) and Artificial Neural Networks (ANN). The performance of each method wasassessed using the same analytical dataset that consists of 861 samples split into drug- and fibre type cannabiswith drug type cannabis being made up of 12 varieties (i.e. 12 classes). The results show that linear classifiersare not able to manage the distribution of classes in which some overlap areas exist for both classificationproblems. Unlike linear classifiers, NNC and RBF SVMs best differentiate cannabis samples both for 2-class and12-class classifications with average classification results up to 99% and 98%, respectively. Furthermore, RBFSVMs correctly classified into drug type cannabis the independent validation set, which consists of cannabisplants coming from police seizures. In forensic case work this study shows that the discrimination betweencannabis samples at an early stage of growth is possible with fairly high classification performance fordiscriminating between cannabis chemotypes or between drug type cannabis varieties.
Resumo:
Peripheral T-cell lymphoma (PTCL) encompasses a heterogeneous group of neoplasms with generally poor clinical outcome. Currently 50% of PTCL cases are not classifiable: PTCL-not otherwise specified (NOS). Gene-expression profiles on 372 PTCL cases were analyzed and robust molecular classifiers and oncogenic pathways that reflect the pathobiology of tumor cells and their microenvironment were identified for major PTCL-entities, including 114 angioimmunoblastic T-cell lymphoma (AITL), 31 anaplastic lymphoma kinase (ALK)-positive and 48 ALK-negative anaplastic large cell lymphoma, 14 adult T-cell leukemia/lymphoma and 44 extranodal NK/T-cell lymphoma that were further separated into NK-cell and gdT-cell lymphomas. Thirty-seven percent of morphologically diagnosed PTCL-NOS cases were reclassified into other specific subtypes by molecular signatures. Reexamination, immunohistochemistry, and IDH2 mutation analysis in reclassified cases supported the validity of the reclassification. Two major molecular subgroups can be identified in the remaining PTCL-NOS cases characterized by high expression of either GATA3 (33%; 40/121) or TBX21 (49%; 59/121). The GATA3 subgroup was significantly associated with poor overall survival (P = .01). High expression of cytotoxic gene-signature within the TBX21 subgroup also showed poor clinical outcome (P = .05). In AITL, high expression of several signatures associated with the tumor microenvironment was significantly associated with outcome. A combined prognostic score was predictive of survival in an independent cohort (P = .004).
Resumo:
Abstract : The occupational health risk involved with handling nanoparticles is the probability that a worker will experience an adverse health effect: this is calculated as a function of the worker's exposure relative to the potential biological hazard of the material. Addressing the risks of nanoparticles requires therefore knowledge on occupational exposure and the release of nanoparticles into the environment as well as toxicological data. However, information on exposure is currently not systematically collected; therefore this risk assessment lacks quantitative data. This thesis aimed at, first creating the fundamental data necessary for a quantitative assessment and, second, evaluating methods to measure the occupational nanoparticle exposure. The first goal was to determine what is being used where in Swiss industries. This was followed by an evaluation of the adequacy of existing measurement methods to assess workplace nanopaiticle exposure to complex size distributions and concentration gradients. The study was conceived as a series of methodological evaluations aimed at better understanding nanoparticle measurement devices and methods. lt focused on inhalation exposure to airborne particles, as respiration is considered to be the most important entrance pathway for nanoparticles in the body in terms of risk. The targeted survey (pilot study) was conducted as a feasibility study for a later nationwide survey on the handling of nanoparticles and the applications of specific protection means in industry. The study consisted of targeted phone interviews with health and safety officers of Swiss companies that were believed to use or produce nanoparticles. This was followed by a representative survey on the level of nanoparticle usage in Switzerland. lt was designed based on the results of the pilot study. The study was conducted among a representative selection of clients of the Swiss National Accident Insurance Fund (SUVA), covering about 85% of Swiss production companies. The third part of this thesis focused on the methods to measure nanoparticles. Several pre- studies were conducted studying the limits of commonly used measurement devices in the presence of nanoparticle agglomerates, This focus was chosen, because several discussions with users and producers of the measurement devices raised questions about their accuracy measuring nanoparticle agglomerates and because, at the same time, the two survey studies revealed that such powders are frequently used in industry. The first preparatory experiment focused on the accuracy of the scanning mobility particle sizer (SMPS), which showed an improbable size distribution when measuring powders of nanoparticle agglomerates. Furthermore, the thesis includes a series of smaller experiments that took a closer look at problems encountered with other measurement devices in the presence of nanoparticle agglomerates: condensation particle counters (CPC), portable aerosol spectrometer (PAS) a device to estimate the aerodynamic diameter, as well as diffusion size classifiers. Some initial feasibility tests for the efficiency of filter based sampling and subsequent counting of carbon nanotubes (CNT) were conducted last. The pilot study provided a detailed picture of the types and amounts of nanoparticles used and the knowledge of the health and safety experts in the companies. Considerable maximal quantities (> l'000 kg/year per company) of Ag, Al-Ox, Fe-Ox, SiO2, TiO2, and ZnO (mainly first generation particles) were declared by the contacted Swiss companies, The median quantity of handled nanoparticles, however, was 100 kg/year. The representative survey was conducted by contacting by post mail a representative selection of l '626 SUVA-clients (Swiss Accident Insurance Fund). It allowed estimation of the number of companies and workers dealing with nanoparticles in Switzerland. The extrapolation from the surveyed companies to all companies of the Swiss production sector suggested that l'309 workers (95%-confidence interval l'073 to l'545) of the Swiss production sector are potentially exposed to nanoparticles in 586 companies (145 to l'027). These numbers correspond to 0.08% (0.06% to 0.09%) of all workers and to 0.6% (0.2% to 1.1%) of companies in the Swiss production sector. To measure airborne concentrations of sub micrometre-sized particles, a few well known methods exist. However, it was unclear how well the different instruments perform in the presence of the often quite large agglomerates of nanostructured materials. The evaluation of devices and methods focused on nanoparticle agglomerate powders. lt allowed the identification of the following potential sources of inaccurate measurements at workplaces with considerable high concentrations of airborne agglomerates: - A standard SMPS showed bi-modal particle size distributions when measuring large nanoparticle agglomerates. - Differences in the range of a factor of a thousand were shown between diffusion size classifiers and CPC/SMPS. - The comparison between CPC/SMPS and portable aerosol Spectrometer (PAS) was much better, but depending on the concentration, size or type of the powders measured, the differences were still of a high order of magnitude - Specific difficulties and uncertainties in the assessment of workplaces were identified: the background particles can interact with particles created by a process, which make the handling of background concentration difficult. - Electric motors produce high numbers of nanoparticles and confound the measurement of the process-related exposure. Conclusion: The surveys showed that nanoparticles applications exist in many industrial sectors in Switzerland and that some companies already use high quantities of them. The representative survey demonstrated a low prevalence of nanoparticle usage in most branches of the Swiss industry and led to the conclusion that the introduction of applications using nanoparticles (especially outside industrial chemistry) is only beginning. Even though the number of potentially exposed workers was reportedly rather small, it nevertheless underscores the need for exposure assessments. Understanding exposure and how to measure it correctly is very important because the potential health effects of nanornaterials are not yet fully understood. The evaluation showed that many devices and methods of measuring nanoparticles need to be validated for nanoparticles agglomerates before large exposure assessment studies can begin. Zusammenfassung : Das Gesundheitsrisiko von Nanopartikel am Arbeitsplatz ist die Wahrscheinlichkeit dass ein Arbeitnehmer einen möglichen Gesundheitsschaden erleidet wenn er diesem Stoff ausgesetzt ist: sie wird gewöhnlich als Produkt von Schaden mal Exposition gerechnet. Für eine gründliche Abklärung möglicher Risiken von Nanomaterialien müssen also auf der einen Seite Informationen über die Freisetzung von solchen Materialien in die Umwelt vorhanden sein und auf der anderen Seite solche über die Exposition von Arbeitnehmenden. Viele dieser Informationen werden heute noch nicht systematisch gesarnmelt und felilen daher für Risikoanalysen, Die Doktorarbeit hatte als Ziel, die Grundlagen zu schaffen für eine quantitative Schatzung der Exposition gegenüber Nanopartikel am Arbeitsplatz und die Methoden zu evaluieren die zur Messung einer solchen Exposition nötig sind. Die Studie sollte untersuchen, in welchem Ausmass Nanopartikel bereits in der Schweizer Industrie eingesetzt werden, wie viele Arbeitnehrner damit potentiel] in Kontakt komrrien ob die Messtechnologie für die nötigen Arbeitsplatzbelastungsmessungen bereits genügt, Die Studie folcussierte dabei auf Exposition gegenüber luftgetragenen Partikel, weil die Atmung als Haupteintrittspforte iïlr Partikel in den Körper angesehen wird. Die Doktorarbeit besteht baut auf drei Phasen auf eine qualitative Umfrage (Pilotstudie), eine repräsentative, schweizerische Umfrage und mehrere technische Stndien welche dem spezitischen Verständnis der Mëglichkeiten und Grenzen einzelner Messgeräte und - teclmikeri dienen. Die qualitative Telephonumfrage wurde durchgeführt als Vorstudie zu einer nationalen und repräsentativen Umfrage in der Schweizer Industrie. Sie zielte auf Informationen ab zum Vorkommen von Nanopartikeln, und den angewendeten Schutzmassnahmen. Die Studie bestand aus gezielten Telefoninterviews mit Arbeit- und Gesundheitsfachpersonen von Schweizer Unternehmen. Die Untemehmen wurden aufgrund von offentlich zugànglichen lnformationen ausgewählt die darauf hinwiesen, dass sie mit Nanopartikeln umgehen. Der zweite Teil der Dolctorarbeit war die repräsentative Studie zur Evalniernng der Verbreitnng von Nanopaitikelanwendungen in der Schweizer lndustrie. Die Studie baute auf lnformationen der Pilotstudie auf und wurde mit einer repräsentativen Selektion von Firmen der Schweizerischen Unfall Versicherungsanstalt (SUVA) durchgeüihxt. Die Mehrheit der Schweizerischen Unternehmen im lndustrieselctor wurde damit abgedeckt. Der dritte Teil der Doktorarbeit fokussierte auf die Methodik zur Messung von Nanopartikeln. Mehrere Vorstudien wurden dnrchgefîihrt, um die Grenzen von oft eingesetzten Nanopartikelmessgeräten auszuloten, wenn sie grösseren Mengen von Nanopartikel Agglomeraten ausgesetzt messen sollen. Dieser F okns wurde ans zwei Gründen gewählt: weil mehrere Dislcussionen rnit Anwendem und auch dem Produzent der Messgeràte dort eine Schwachstelle vermuten liessen, welche Zweifel an der Genauigkeit der Messgeräte aufkommen liessen und weil in den zwei Umfragestudien ein häufiges Vorkommen von solchen Nanopartikel-Agglomeraten aufgezeigt wurde. i Als erstes widmete sich eine Vorstndie der Genauigkeit des Scanning Mobility Particle Sizer (SMPS). Dieses Messgerät zeigte in Präsenz von Nanopartikel Agglorneraten unsinnige bimodale Partikelgrössenverteilung an. Eine Serie von kurzen Experimenten folgte, welche sich auf andere Messgeräte und deren Probleme beim Messen von Nanopartikel-Agglomeraten konzentrierten. Der Condensation Particle Counter (CPC), der portable aerosol spectrometer (PAS), ein Gerät zur Schàtzung des aerodynamischen Durchniessers von Teilchen, sowie der Diffusion Size Classifier wurden getestet. Einige erste Machbarkeitstests zur Ermittlnng der Effizienz von tilterbasierter Messung von luftgetragenen Carbon Nanotubes (CNT) wnrden als letztes durchgeiührt. Die Pilotstudie hat ein detailliiertes Bild der Typen und Mengen von genutzten Nanopartikel in Schweizer Unternehmen geliefert, und hat den Stand des Wissens der interviewten Gesundheitsschntz und Sicherheitsfachleute aufgezeigt. Folgende Typen von Nanopaitikeln wurden von den kontaktierten Firmen als Maximalmengen angegeben (> 1'000 kg pro Jahr / Unternehrnen): Ag, Al-Ox, Fe-Ox, SiO2, TiO2, und ZnO (hauptsächlich Nanopartikel der ersten Generation). Die Quantitäten von eingesetzten Nanopartikeln waren stark verschieden mit einem ein Median von 100 kg pro Jahr. ln der quantitativen Fragebogenstudie wurden l'626 Unternehmen brieflich kontaktiert; allesamt Klienten der Schweizerischen Unfallversicherringsanstalt (SUVA). Die Resultate der Umfrage erlaubten eine Abschätzung der Anzahl von Unternehmen und Arbeiter, welche Nanopartikel in der Schweiz anwenden. Die Hochrechnung auf den Schweizer lndnstriesektor hat folgendes Bild ergeben: ln 586 Unternehmen (95% Vertrauensintervallz 145 bis 1'027 Unternehmen) sind 1'309 Arbeiter potentiell gegenüber Nanopartikel exponiert (95%-Vl: l'073 bis l'545). Diese Zahlen stehen für 0.6% der Schweizer Unternehmen (95%-Vl: 0.2% bis 1.1%) und 0.08% der Arbeiternehmerschaft (95%-V1: 0.06% bis 0.09%). Es gibt einige gut etablierte Technologien um die Luftkonzentration von Submikrometerpartikel zu messen. Es besteht jedoch Zweifel daran, inwiefern sich diese Technologien auch für die Messurrg von künstlich hergestellten Nanopartikeln verwenden lassen. Aus diesem Grund folcussierten die vorbereitenden Studien für die Arbeitsplatzbeurteilnngen auf die Messung von Pulverri, welche Nan0partike1-Agg10merate enthalten. Sie erlaubten die ldentifikation folgender rnöglicher Quellen von fehlerhaften Messungen an Arbeitsplätzen mit erhöhter Luft-K0nzentrati0n von Nanopartikel Agglomeratenz - Ein Standard SMPS zeigte eine unglaubwürdige bimodale Partikelgrössenverteilung wenn er grössere Nan0par'til<e1Agg10merate gemessen hat. - Grosse Unterschiede im Bereich von Faktor tausend wurden festgestellt zwischen einem Diffusion Size Classiîier und einigen CPC (beziehungsweise dem SMPS). - Die Unterschiede zwischen CPC/SMPS und dem PAS waren geringer, aber abhängig von Grosse oder Typ des gemessenen Pulvers waren sie dennoch in der Grössenordnung von einer guten Grössenordnung. - Spezifische Schwierigkeiten und Unsicherheiten im Bereich von Arbeitsplatzmessungen wurden identitiziert: Hintergrundpartikel können mit Partikeln interagieren die während einem Arbeitsprozess freigesetzt werden. Solche Interaktionen erschweren eine korrekte Einbettung der Hintergrunds-Partikel-Konzentration in die Messdaten. - Elektromotoren produzieren grosse Mengen von Nanopartikeln und können so die Messung der prozessbezogenen Exposition stören. Fazit: Die Umfragen zeigten, dass Nanopartikel bereits Realitàt sind in der Schweizer Industrie und dass einige Unternehmen bereits grosse Mengen davon einsetzen. Die repräsentative Umfrage hat diese explosive Nachricht jedoch etwas moderiert, indem sie aufgezeigt hat, dass die Zahl der Unternehmen in der gesamtschweizerischen Industrie relativ gering ist. In den meisten Branchen (vor allem ausserhalb der Chemischen Industrie) wurden wenig oder keine Anwendungen gefunden, was schliessen last, dass die Einführung dieser neuen Technologie erst am Anfang einer Entwicklung steht. Auch wenn die Zahl der potentiell exponierten Arbeiter immer noch relativ gering ist, so unterstreicht die Studie dennoch die Notwendigkeit von Expositionsmessungen an diesen Arbeitsplätzen. Kenntnisse um die Exposition und das Wissen, wie solche Exposition korrekt zu messen, sind sehr wichtig, vor allem weil die möglichen Auswirkungen auf die Gesundheit noch nicht völlig verstanden sind. Die Evaluation einiger Geräte und Methoden zeigte jedoch, dass hier noch Nachholbedarf herrscht. Bevor grössere Mess-Studien durgefîihrt werden können, müssen die Geräte und Methodem für den Einsatz mit Nanopartikel-Agglomeraten validiert werden.
Resumo:
Soil surveys are the main source of spatial information on soils and have a range of different applications, mainly in agriculture. The continuity of this activity has however been severely compromised, mainly due to a lack of governmental funding. The purpose of this study was to evaluate the feasibility of two different classifiers (artificial neural networks and a maximum likelihood algorithm) in the prediction of soil classes in the northwest of the state of Rio de Janeiro. Terrain attributes such as elevation, slope, aspect, plan curvature and compound topographic index (CTI) and indices of clay minerals, iron oxide and Normalized Difference Vegetation Index (NDVI), derived from Landsat 7 ETM+ sensor imagery, were used as discriminating variables. The two classifiers were trained and validated for each soil class using 300 and 150 samples respectively, representing the characteristics of these classes in terms of the discriminating variables. According to the statistical tests, the accuracy of the classifier based on artificial neural networks (ANNs) was greater than of the classic Maximum Likelihood Classifier (MLC). Comparing the results with 126 points of reference showed that the resulting ANN map (73.81 %) was superior to the MLC map (57.94 %). The main errors when using the two classifiers were caused by: a) the geological heterogeneity of the area coupled with problems related to the geological map; b) the depth of lithic contact and/or rock exposure, and c) problems with the environmental correlation model used due to the polygenetic nature of the soils. This study confirms that the use of terrain attributes together with remote sensing data by an ANN approach can be a tool to facilitate soil mapping in Brazil, primarily due to the availability of low-cost remote sensing data and the ease by which terrain attributes can be obtained.
Resumo:
In this paper, we develop a data-driven methodology to characterize the likelihood of orographic precipitation enhancement using sequences of weather radar images and a digital elevation model (DEM). Geographical locations with topographic characteristics favorable to enforce repeatable and persistent orographic precipitation such as stationary cells, upslope rainfall enhancement, and repeated convective initiation are detected by analyzing the spatial distribution of a set of precipitation cells extracted from radar imagery. Topographic features such as terrain convexity and gradients computed from the DEM at multiple spatial scales as well as velocity fields estimated from sequences of weather radar images are used as explanatory factors to describe the occurrence of localized precipitation enhancement. The latter is represented as a binary process by defining a threshold on the number of cell occurrences at particular locations. Both two-class and one-class support vector machine classifiers are tested to separate the presumed orographic cells from the nonorographic ones in the space of contributing topographic and flow features. Site-based validation is carried out to estimate realistic generalization skills of the obtained spatial prediction models. Due to the high class separability, the decision function of the classifiers can be interpreted as a likelihood or susceptibility of orographic precipitation enhancement. The developed approach can serve as a basis for refining radar-based quantitative precipitation estimates and short-term forecasts or for generating stochastic precipitation ensembles conditioned on the local topography.
Resumo:
We conduct a large-scale comparative study on linearly combining superparent-one-dependence estimators (SPODEs), a popular family of seminaive Bayesian classifiers. Altogether, 16 model selection and weighing schemes, 58 benchmark data sets, and various statistical tests are employed. This paper's main contributions are threefold. First, it formally presents each scheme's definition, rationale, and time complexity and hence can serve as a comprehensive reference for researchers interested in ensemble learning. Second, it offers bias-variance analysis for each scheme's classification error performance. Third, it identifies effective schemes that meet various needs in practice. This leads to accurate and fast classification algorithms which have an immediate and significant impact on real-world applications. Another important feature of our study is using a variety of statistical tests to evaluate multiple learning methods across multiple data sets.
Resumo:
A common way to model multiclass classification problems is by means of Error-Correcting Output Codes (ECOCs). Given a multiclass problem, the ECOC technique designs a code word for each class, where each position of the code identifies the membership of the class for a given binary problem. A classification decision is obtained by assigning the label of the class with the closest code. One of the main requirements of the ECOC design is that the base classifier is capable of splitting each subgroup of classes from each binary problem. However, we cannot guarantee that a linear classifier model convex regions. Furthermore, nonlinear classifiers also fail to manage some type of surfaces. In this paper, we present a novel strategy to model multiclass classification problems using subclass information in the ECOC framework. Complex problems are solved by splitting the original set of classes into subclasses and embedding the binary problems in a problem-dependent ECOC design. Experimental results show that the proposed splitting procedure yields a better performance when the class overlap or the distribution of the training objects conceal the decision boundaries for the base classifier. The results are even more significant when one has a sufficiently large training size.
Resumo:
In this study we propose an evaluation of the angular effects altering the spectral response of the land-cover over multi-angle remote sensing image acquisitions. The shift in the statistical distribution of the pixels observed in an in-track sequence of WorldView-2 images is analyzed by means of a kernel-based measure of distance between probability distributions. Afterwards, the portability of supervised classifiers across the sequence is investigated by looking at the evolution of the classification accuracy with respect to the changing observation angle. In this context, the efficiency of various physically and statistically based preprocessing methods in obtaining angle-invariant data spaces is compared and possible synergies are discussed.
Resumo:
Artifacts are present in most of the electroencephalography (EEG) recordings, making it difficult to interpret or analyze the data. In this paper a cleaning procedure based on a multivariate extension of empirical mode decomposition is used to improve the quality of the data. This is achieved by applying the cleaning method to raw EEG data. Then, a synchrony measure is applied on the raw and the clean data in order to compare the improvement of the classification rate. Two classifiers are used, linear discriminant analysis and neural networks. For both cases, the classification rate is improved about 20%.
Resumo:
In this paper, mixed spectral-structural kernel machines are proposed for the classification of very-high resolution images. The simultaneous use of multispectral and structural features (computed using morphological filters) allows a significant increase in classification accuracy of remote sensing images. Subsequently, weighted summation kernel support vector machines are proposed and applied in order to take into account the multiscale nature of the scene considered. Such classifiers use the Mercer property of kernel matrices to compute a new kernel matrix accounting simultaneously for two scale parameters. Tests on a Zurich QuickBird image show the relevance of the proposed method : using the mixed spectral-structural features, the classification accuracy increases of about 5%, achieving a Kappa index of 0.97. The multikernel approach proposed provide an overall accuracy of 98.90% with related Kappa index of 0.985.