15 resultados para Classification Methods
Resumo:
Smart management of maintenances has become fundamental in manufacturing environments in order to decrease downtime and costs associated with failures. Predictive Maintenance (PdM) systems based on Machine Learning (ML) techniques have the possibility with low added costs of drastically decrease failures-related expenses; given the increase of availability of data and capabilities of ML tools, PdM systems are becoming really popular, especially in semiconductor manufacturing. A PdM module based on Classification methods is presented here for the prediction of integral type faults that are related to machine usage and stress of equipment parts. The module has been applied to an important class of semiconductor processes, ion-implantation, for the prediction of ion-source tungsten filament breaks. The PdM has been tested on a real production dataset. © 2013 IEEE.
Resumo:
Classification methods with embedded feature selection capability are very appealing for the analysis of complex processes since they allow the analysis of root causes even when the number of input variables is high. In this work, we investigate the performance of three techniques for classification within a Monte Carlo strategy with the aim of root cause analysis. We consider the naive bayes classifier and the logistic regression model with two different implementations for controlling model complexity, namely, a LASSO-like implementation with a L1 norm regularization and a fully Bayesian implementation of the logistic model, the so called relevance vector machine. Several challenges can arise when estimating such models mainly linked to the characteristics of the data: a large number of input variables, high correlation among subsets of variables, the situation where the number of variables is higher than the number of available data points and the case of unbalanced datasets. Using an ecological and a semiconductor manufacturing dataset, we show advantages and drawbacks of each method, highlighting the superior performance in term of classification accuracy for the relevance vector machine with respect to the other classifiers. Moreover, we show how the combination of the proposed techniques and the Monte Carlo approach can be used to get more robust insights into the problem under analysis when faced with challenging modelling conditions.
Resumo:
The Magellanic Clouds are uniquely placed to study the stellar contribution to dust emission. Individual stars can be resolved in these systems even in the mid-infrared, and they are close enough to allow detection of infrared excess caused by dust. We have searched the Spitzer Space Telescope data archive for all Infrared Spectrograph (IRS) staring-mode observations of the Small Magellanic Cloud (SMC) and found that 209 Infrared Array Camera (IRAC) point sources within the footprint of the Surveying the Agents of Galaxy Evolution in the Small Magellanic Cloud (SAGE-SMC) Spitzer Legacy programme were targeted, within a total of 311 staring-mode observations. We classify these point sources using a decision tree method of object classification, based on infrared spectral features, continuum and spectral energy distribution shape, bolometric luminosity, cluster membership and variability information. We find 58 asymptotic giant branch (AGB) stars, 51 young stellar objects, 4 post-AGB objects, 22 red supergiants, 27 stars (of which 23 are dusty OB stars), 24 planetary nebulae (PNe), 10 Wolf-Rayet stars, 3 H II regions, 3 R Coronae Borealis stars, 1 Blue Supergiant and 6 other objects, including 2 foreground AGB stars. We use these classifications to evaluate the success of photometric classification methods reported in the literature.
Resumo:
Objectives: This study examined the validity of a latent class typology of adolescent drinking based on four alcohol dimensions; frequency of drinking, quantity consumed, frequency of binge drinking and the number of alcohol related problems encountered. Method: Data used were from the 1970 British Cohort Study sixteen-year-old follow-up. Partial or complete responses to the selected alcohol measures were provided by 6,516 cohort members. The data were collected via a series of postal questionnaires. Results: A five class LCA typology was constructed. Around 12% of the sample were classified as �hazardous drinkers� reporting frequent drinking, high levels of alcohol consumed, frequent binge drinking and multiple alcohol related problems. Multinomial logistic regression, with multiple imputation for missing data, was used to assess the covariates of adolescent drinking patterns. Hazardous drinking was associated with being white, being male, having heavy drinking parents (in particular fathers), smoking, illicit drug use, and minor and violent offending behaviour. Non-significant associations were found between drinking patterns and general mental health and attention deficient disorder. Conclusion: The latent class typology exhibited concurrent validity in terms of its ability to distinguish respondents across a number of alcohol and non-alcohol indicators. Notwithstanding a number of limitations, latent class analysis offers an alternative data reduction method for the construction of drinking typologies that addresses known weaknesses inherent in more tradition classification methods.
Resumo:
In semiconductor fabrication processes, effective management of maintenance operations is fundamental to decrease costs associated with failures and downtime. Predictive Maintenance (PdM) approaches, based on statistical methods and historical data, are becoming popular for their predictive capabilities and low (potentially zero) added costs. We present here a PdM module based on Support Vector Machines for prediction of integral type faults, that is, the kind of failures that happen due to machine usage and stress of equipment parts. The proposed module may also be employed as a health factor indicator. The module has been applied to a frequent maintenance problem in semiconductor manufacturing industry, namely the breaking of the filament in the ion-source of ion-implantation tools. The PdM has been tested on a real production dataset. © 2013 IEEE.
Resumo:
The in-line measurement of COD and NH4-N in the WWTP inflow is crucial for the timely monitoring of biological wastewater treatment processes and for the development of advanced control strategies for optimized WWTP operation. As a direct measurement of COD and NH4-N requires expensive and high maintenance in-line probes or analyzers, an approach estimating COD and NH4-N based on standard and spectroscopic in-line inflow measurement systems using Machine Learning Techniques is presented in this paper. The results show that COD estimation using Radom Forest Regression with a normalized MSE of 0.3, which is sufficiently accurate for practical applications, can be achieved using only standard in-line measurements. In the case of NH4-N, a good estimation using Partial Least Squares Regression with a normalized MSE of 0.16 is only possible based on a combination of standard and spectroscopic in-line measurements. Furthermore, the comparison of regression and classification methods shows that both methods perform equally well in most cases.
Resumo:
Sediment particle size analysis (PSA) is routinely used to support benthic macrofaunal community distribution data in habitat mapping and Ecological Status (ES) assessment. No optimal PSA Method to explain variability in multivariate macrofaunal distribution has been identified nor have the effects of changing sampling strategy been examined. Here, we use benthic macrofaunal and PSA grabs from two embayments in the south of Ireland. Four frequently used PSA Methods and two common sampling strategies are applied. A combination of laser particle sizing and wet/dry sieving without peroxide pre-treatment to remove organics was identified as the optimal Method for explaining macrofaunal distributions. ES classifications and EUNIS sediment classification were robust to changes in PSA Method. Fauna and PSA samples returned from the same grab sample significantly decreased macrofaunal variance explained by PSA and caused ES to be classified as lower. Employing the optimal PSA Method and sampling strategy will improve benthic monitoring. © 2012 Elsevier Ltd.
Resumo:
Grey Level Co-occurrence Matrix (GLCM), one of the best known tool for texture analysis, estimates image properties related to second-order statistics. These image properties commonly known as Haralick texture features can be used for image classification, image segmentation, and remote sensing applications. However, their computations are highly intensive especially for very large images such as medical ones. Therefore, methods to accelerate their computations are highly desired. This paper proposes the use of programmable hardware to accelerate the calculation of GLCM and Haralick texture features. Further, as an example of the speedup offered by programmable logic, a multispectral computer vision system for automatic diagnosis of prostatic cancer has been implemented. The performance is then compared against a microprocessor based solution.
Resumo:
This paper considers invariant texture analysis. Texture analysis approaches whose performances are not affected by translation, rotation, affine, and perspective transform are addressed. Existing invariant texture analysis algorithms are carefully studied and classified into three categories: statistical methods, model based methods, and structural methods. The importance of invariant texture analysis is presented first. Each approach is reviewed according to its classification, and its merits and drawbacks are outlined. The focus of possible future work is also suggested.
Resumo:
PURPOSE. To describe and classify patterns of abnormal fundus autofluorescence (FAF) in eyes with early nonexudative age-related macular disease (AMD). METHODS. FAF images were recorded in eyes with early AMD by confocal scanning laser ophthalmoscopy (cSLO) with excitation at 488 nm (argon or OPSL laser) and emission above 500 or 521 nm (barrier filter). A standardized protocol for image acquisition and generation of mean images after automated alignment was applied, and routine fundus photographs were obtained. FAF images were classified by two independent observers. The ? statistic was applied to assess intra- and interobserver variability. RESULTS. Alterations in FAF were classified into eight phenotypic patterns including normal, minimal change, focal increased, patchy, linear, lacelike, reticular, and speckled. Areas with abnormal increased or decreased FAF signals may or may not have corresponded to funduscopically visible alterations. For intraobserver variability, ? of observer I was 0.80 (95% confidence interval [CI]0.71-0.89) and of observer II, 0.74. (95% CI, 0.64-0.84). For interobserver variability, ? was 0.77 (95% CI, 0.67-0.87). CONCLUSIONS. Various phenotypic patterns of abnormal FAF can be identified with cSLO imaging. Distinct patterns may reflect heterogeneity at a cellular and molecular level in contrast to a nonspecific aging process. The results indicate that the classification system yields a relatively high degree of intra- and interobserver agreement. It may be applicable for determination of novel prognostic determinants in longitudinal natural history studies, for identification of genetic risk factors, and for monitoring of future therapeutic interventions to slow the progression of early AMD. Copyright © Association for Research in Vision and Ophthalmology.
Resumo:
Background - Iris cysts in children are uncommon and there is relatively little information on their classification, incidence, and management. Methods - The records of all children under age 20 years who were diagnosed with iris cyst were reviewed and the types and incidence of iris cysts of childhood determined. Based on these observations recommendations were made regarding management of iris cysts in children. Results - Of 57 iris cysts in children, 53 were primary and four were secondary. There were 44 primary cysts of the iris pigment epithelium, 34 of which were of the peripheral or iridociliary type, accounting for 59% of all childhood iris cysts. It was most commonly diagnosed in the teenage years, more common in girls (68%), was not recognised in infancy, remained stationary or regressed, and required no treatment. The five mid-zonal pigment epithelial cysts were diagnosed at a mean age of 14 years, were more common in boys (83%), remained stationary, and required no treatment. The pupillary type of pigment epithelial cyst was generally recognised in infancy and, despite involvement of the pupillary aperture, also required no treatment. There were nine cases of primary iris stromal cysts, accounting for 16% of all childhood iris cysts. This cyst was usually diagnosed in infancy, was generally progressive, and required treatment in eight of the nine cases, usually by aspiration and cryotherapy or surgical resection. Among the secondary iris cysts, two were post-traumatic epithelial ingrowth cysts and two were tumour induced cysts, one arising from an intraocular lacrimal gland choristoma and one adjacent to a peripheral iris naevus. Conclusions - Most iris cysts of childhood are primary pigment epithelial cysts and require no treatment. However, the iris stromal cyst, usually recognised in infancy, is generally an aggressive lesion that requires treatment by aspiration or surgical excision.
Resumo:
Aims/hypothesis: Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. Methods: We exploited a novel algorithm, ‘Bag of Naive Bayes’, whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK–Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). Results: Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case–control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. Conclusions/interpretation: This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.
Resumo:
Pollen grains are microscopic so their identification and quantification has, for decades, depended upon human observers using light microscopes: a labour-intensive approach. Modern improvements in computing and imaging hardware and software now bring automation of pollen analyses within reach. In this paper, we provide the first review in over 15 yr of progress towards automation of the part of palynology concerned with counting and classifying pollen, bringing together literature published from a wide spectrum of sources. We
consider which attempts offer the most potential for an automated palynology system for universal application across all fields of research concerned with pollen classification and counting. We discuss what is required to make the datasets of these automated systems as acceptable as those produced by human palynologists, and present suggestions for how automation will generate novel approaches to counting and classifying pollen that have hitherto been unthinkable.
Resumo:
Despite pattern recognition methods for human behavioral analysis has flourished in the last decade, animal behavioral analysis has been almost neglected. Those few approaches are mostly focused on preserving livestock economic value while attention on the welfare of companion animals, like dogs, is now emerging as a social need. In this work, following the analogy with human behavior recognition, we propose a system for recognizing body parts of dogs kept in pens. We decide to adopt both 2D and 3D features in order to obtain a rich description of the dog model. Images are acquired using the Microsoft Kinect to capture the depth map images of the dog. Upon depth maps a Structural Support Vector Machine (SSVM) is employed to identify the body parts using both 3D features and 2D images. The proposal relies on a kernelized discriminative structural classificator specifically tailored for dogs independently from the size and breed. The classification is performed in an online fashion using the LaRank optimization technique to obtaining real time performances. Promising results have emerged during the experimental evaluation carried out at a dog shelter, managed by IZSAM, in Teramo, Italy.
Resumo:
Background and aims: Machine learning techniques for the text mining of cancer-related clinical documents have not been sufficiently explored. Here some techniques are presented for the pre-processing of free-text breast cancer pathology reports, with the aim of facilitating the extraction of information relevant to cancer staging.
Materials and methods: The first technique was implemented using the freely available software RapidMiner to classify the reports according to their general layout: ‘semi-structured’ and ‘unstructured’. The second technique was developed using the open source language engineering framework GATE and aimed at the prediction of chunks of the report text containing information pertaining to the cancer morphology, the tumour size, its hormone receptor status and the number of positive nodes. The classifiers were trained and tested respectively on sets of 635 and 163 manually classified or annotated reports, from the Northern Ireland Cancer Registry.
Results: The best result of 99.4% accuracy – which included only one semi-structured report predicted as unstructured – was produced by the layout classifier with the k nearest algorithm, using the binary term occurrence word vector type with stopword filter and pruning. For chunk recognition, the best results were found using the PAUM algorithm with the same parameters for all cases, except for the prediction of chunks containing cancer morphology. For semi-structured reports the performance ranged from 0.97 to 0.94 and from 0.92 to 0.83 in precision and recall, while for unstructured reports performance ranged from 0.91 to 0.64 and from 0.68 to 0.41 in precision and recall. Poor results were found when the classifier was trained on semi-structured reports but tested on unstructured.
Conclusions: These results show that it is possible and beneficial to predict the layout of reports and that the accuracy of prediction of which segments of a report may contain certain information is sensitive to the report layout and the type of information sought.