901 resultados para classification method
Resumo:
Hierarchical multi-label classification is a complex classification task where the classes involved in the problem are hierarchically structured and each example may simultaneously belong to more than one class in each hierarchical level. In this paper, we extend our previous works, where we investigated a new local-based classification method that incrementally trains a multi-layer perceptron for each level of the classification hierarchy. Predictions made by a neural network in a given level are used as inputs to the neural network responsible for the prediction in the next level. We compare the proposed method with one state-of-the-art decision-tree induction method and two decision-tree induction methods, using several hierarchical multi-label classification datasets. We perform a thorough experimental analysis, showing that our method obtains competitive results to a robust global method regarding both precision and recall evaluation measures.
Resumo:
The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.
Resumo:
Current methods to characterize mesenchymal stem cells (MSCs) are limited to CD marker expression, plastic adherence and their ability to differentiate into adipogenic, osteogenic and chondrogenic precursors. It seems evident that stem cells undergoing differentiation should differ in many aspects, such as morphology and possibly also behaviour; however, such a correlation has not yet been exploited for fate prediction of MSCs. Primary human MSCs from bone marrow were expanded and pelleted to form high-density cultures and were then randomly divided into four groups to differentiate into adipogenic, osteogenic chondrogenic and myogenic progenitor cells. The cells were expanded as heterogeneous and tracked with time-lapse microscopy to record cell shape, using phase-contrast microscopy. The cells were segmented using a custom-made image-processing pipeline. Seven morphological features were extracted for each of the segmented cells. Statistical analysis was performed on the seven-dimensional feature vectors, using a tree-like classification method. Differentiation of cells was monitored with key marker genes and histology. Cells in differentiation media were expressing the key genes for each of the three pathways after 21 days, i.e. adipogenic, osteogenic and chondrogenic, which was also confirmed by histological staining. Time-lapse microscopy data were obtained and contained new evidence that two cell shape features, eccentricity and filopodia (= 'fingers') are highly informative to classify myogenic differentiation from all others. However, no robust classifiers could be identified for the other cell differentiation paths. The results suggest that non-invasive automated time-lapse microscopy could potentially be used to predict the stem cell fate of hMSCs for clinical application, based on morphology for earlier time-points. The classification is challenged by cell density, proliferation and possible unknown donor-specific factors, which affect the performance of morphology-based approaches. Copyright © 2012 John Wiley & Sons, Ltd.
Resumo:
Traditional methods do not actually measure peoples’ risk attitude naturally and precisely. Therefore, a fuzzy risk attitude classification method is developed. Since the prospect theory is usually considered as an effective model of decision making, the personalized parameters in prospect theory are firstly fuzzified to distinguish people with different risk attitudes, and then a fuzzy classification database schema is applied to calculate the exact value of risk value attitude and risk be- havior attitude. Finally, by applying a two-hierarchical clas- sification model, the precise value of synthetical risk attitude can be acquired.
Resumo:
In population studies, most current methods focus on identifying one outcome-related SNP at a time by testing for differences of genotype frequencies between disease and healthy groups or among different population groups. However, testing a great number of SNPs simultaneously has a problem of multiple testing and will give false-positive results. Although, this problem can be effectively dealt with through several approaches such as Bonferroni correction, permutation testing and false discovery rates, patterns of the joint effects by several genes, each with weak effect, might not be able to be determined. With the availability of high-throughput genotyping technology, searching for multiple scattered SNPs over the whole genome and modeling their joint effect on the target variable has become possible. Exhaustive search of all SNP subsets is computationally infeasible for millions of SNPs in a genome-wide study. Several effective feature selection methods combined with classification functions have been proposed to search for an optimal SNP subset among big data sets where the number of feature SNPs far exceeds the number of observations. ^ In this study, we take two steps to achieve the goal. First we selected 1000 SNPs through an effective filter method and then we performed a feature selection wrapped around a classifier to identify an optimal SNP subset for predicting disease. And also we developed a novel classification method-sequential information bottleneck method wrapped inside different search algorithms to identify an optimal subset of SNPs for classifying the outcome variable. This new method was compared with the classical linear discriminant analysis in terms of classification performance. Finally, we performed chi-square test to look at the relationship between each SNP and disease from another point of view. ^ In general, our results show that filtering features using harmononic mean of sensitivity and specificity(HMSS) through linear discriminant analysis (LDA) is better than using LDA training accuracy or mutual information in our study. Our results also demonstrate that exhaustive search of a small subset with one SNP, two SNPs or 3 SNP subset based on best 100 composite 2-SNPs can find an optimal subset and further inclusion of more SNPs through heuristic algorithm doesn't always increase the performance of SNP subsets. Although sequential forward floating selection can be applied to prevent from the nesting effect of forward selection, it does not always out-perform the latter due to overfitting from observing more complex subset states. ^ Our results also indicate that HMSS as a criterion to evaluate the classification ability of a function can be used in imbalanced data without modifying the original dataset as against classification accuracy. Our four studies suggest that Sequential Information Bottleneck(sIB), a new unsupervised technique, can be adopted to predict the outcome and its ability to detect the target status is superior to the traditional LDA in the study. ^ From our results we can see that the best test probability-HMSS for predicting CVD, stroke,CAD and psoriasis through sIB is 0.59406, 0.641815, 0.645315 and 0.678658, respectively. In terms of group prediction accuracy, the highest test accuracy of sIB for diagnosing a normal status among controls can reach 0.708999, 0.863216, 0.639918 and 0.850275 respectively in the four studies if the test accuracy among cases is required to be not less than 0.4. On the other hand, the highest test accuracy of sIB for diagnosing a disease among cases can reach 0.748644, 0.789916, 0.705701 and 0.749436 respectively in the four studies if the test accuracy among controls is required to be at least 0.4. ^ A further genome-wide association study through Chi square test shows that there are no significant SNPs detected at the cut-off level 9.09451E-08 in the Framingham heart study of CVD. Study results in WTCCC can only detect two significant SNPs that are associated with CAD. In the genome-wide study of psoriasis most of top 20 SNP markers with impressive classification accuracy are also significantly associated with the disease through chi-square test at the cut-off value 1.11E-07. ^ Although our classification methods can achieve high accuracy in the study, complete descriptions of those classification results(95% confidence interval or statistical test of differences) require more cost-effective methods or efficient computing system, both of which can't be accomplished currently in our genome-wide study. We should also note that the purpose of this study is to identify subsets of SNPs with high prediction ability and those SNPs with good discriminant power are not necessary to be causal markers for the disease.^
Resumo:
* This study was supported in part by the Natural Sciences and Engineering Research Council of Canada, and by the Gastrointestinal Motility Laboratory (University of Alberta Hospitals) in Edmonton, Alberta, Canada.
Resumo:
Inspired by human visual cognition mechanism, this paper first presents a scene classification method based on an improved standard model feature. Compared with state-of-the-art efforts in scene classification, the newly proposed method is more robust, more selective, and of lower complexity. These advantages are demonstrated by two sets of experiments on both our own database and standard public ones. Furthermore, occlusion and disorder problems in scene classification in video surveillance are also first studied in this paper. © 2010 IEEE.
Resumo:
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia Informática
Resumo:
Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial Technologies
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Abstract Background: Idiopathic dilated cardiomyopathy (IDCM), most common cardiac cause of pediatric deaths, mortality descriptor: a low left ventricular ejection fraction (LVEF) and low functional capacity (FC). FC is never self reported by children. Objective: The aims of this study were (i) To evaluate whether functional classifications according to the children, parents and medical staff were associated. (iv) To evaluate whether there was correlation between VO2 max and Weber's classification. Method: Prepubertal children with IDCM and HF (by previous IDCM and preserved LVEF) were selected, evaluated and compared. All children were assessed by testing, CPET and functional class classification. Results: Chi-square test showed association between a CFm and CFp (1, n = 31) = 20.6; p = 0.002. There was no significant association between CFp and CFc (1, n = 31) = 6.7; p = 0.4. CFm and CFc were not associated as well (1, n = 31) = 1.7; p = 0.8. Weber's classification was associated to CFm (1, n = 19) = 11.8; p = 0.003, to CFp (1, n = 19) = 20.4; p = 0.0001and CFc (1, n = 19) = 6.4; p = 0.04). Conclusion: Drawing were helpful for children's self NYHA classification, which were associated to Weber's stratification.
Resumo:
Distribution of socio-economic features in urban space is an important source of information for land and transportation planning. The metropolization phenomenon has changed the distribution of types of professions in space and has given birth to different spatial patterns that the urban planner must know in order to plan a sustainable city. Such distributions can be discovered by statistical and learning algorithms through different methods. In this paper, an unsupervised classification method and a cluster detection method are discussed and applied to analyze the socio-economic structure of Switzerland. The unsupervised classification method, based on Ward's classification and self-organized maps, is used to classify the municipalities of the country and allows to reduce a highly-dimensional input information to interpret the socio-economic landscape. The cluster detection method, the spatial scan statistics, is used in a more specific manner in order to detect hot spots of certain types of service activities. The method is applied to the distribution services in the agglomeration of Lausanne. Results show the emergence of new centralities and can be analyzed in both transportation and social terms.
Resumo:
Raman spectroscopy has become an attractive tool for the analysis of pharmaceutical solid dosage forms. In the present study it is used to ensure the identity of tablets. The two main applications of this method are release of final products in quality control and detection of counterfeits. Twenty-five product families of tablets have been included in the spectral library and a non-linear classification method, the Support Vector Machines (SVMs), has been employed. Two calibrations have been developed in cascade: the first one identifies the product family while the second one specifies the formulation. A product family comprises different formulations that have the same active pharmaceutical ingredient (API) but in a different amount. Once the tablets have been classified by the SVM model, API peaks detection and correlation are applied in order to have a specific method for the identification and allow in the future to discriminate counterfeits from genuine products. This calibration strategy enables the identification of 25 product families without error and in the absence of prior information about the sample. Raman spectroscopy coupled with chemometrics is therefore a fast and accurate tool for the identification of pharmaceutical tablets.
Resumo:
Résumé : Erythropoietin (EPO) is a glycoprotein hormone endogenously produced by the kidney, whose main physiological role is the stimulation of erythropoiesis. Since the beginning of the nineties, recombinant human EPO (rhEPO), a potent anti-anaemia treatment drug, has been manufactured by pharmaceutical industries. However, the erythropoiesis stimulating power of rhEPO was rapidly misused by unscrupulous athletes in order to improve their performances in endurance sports. Endogenous EPO has the same amino-acid backbone as most of recombinant forms; the molecules however differ through their respective glycosylation patterns. This difference constitutes the basis of the usual EPO screening test (IEF) developed in 2000 and still currently used in all anti-doping laboratories of the world. Nowadays, 3 EPO generations have been commercialized. The fight against EPO abuse is a continuous challenge for anti-doping laboratories. The diversity of recombinant EPO forms and the continuous development of new ones considerably confuse the identification of EPO doping. Several facets of this fight were investigated in this work. One of the limiting aspects of doping agents screening is the availability of positive samples. Therefore, 2nd and 3rd generation EPOS, namely NESP and C.E.R.A., were injected to healthy subjects in the frame of pilot clinical studies. These latter allowed to review the current EPO identification criteria defined by the World Anti-Doping Agency (WADA) in the case of NESP and to validate and implement a new assay targeting C.E.R.A. in human serum. Both studies resulted in the determination of the respective detection windows of NESP and C.E.R.A. in biological fluids. Following that, Dynepo, a 1st generation EPO presenting similarities with the endogenous form, was also in the centre of a similar clinical study. Our work aimed to overcome the actual identification criteria, which are not adapted to Dynpeo, and to propose an alternative pattern classification method based on the discriminant analysis of IEF EPO profiles. This method might be validated for other EPO forms in the future. The detection window of this molecule was also determined. Under particular conditions, confounding effects can complicate the identification of EPO in biological matrices. For example, athletes having performed a strenuous physical effort can excrete modified isoforms of endogenous EPO, making it very similar to some recombinant forms. Such phenomena, called effort urines, were reproduced under controlled conditions and, after characterization of effort EPO, an urinary biochemical marker was proposed to unequivocally identify effort urines. It also happens that EPO analyses fail to detect endogenous levels of EPO. Such profiles were thoroughly investigated and potential causes identified. Natural reasons relying on urine properties and test specificity were underlined, but the possible addition of adulterant agents in urine samples was also considered. Therefore, a simple biochemical assay targeting the suspected substances was set up. Our work was based on the characterization of atypical EPO profiles from different origins. Therefore, 3 EPO molecules representing the 3 generations of the drug and 2 confounding effects confusing the results interpretation were studied. These studies resulted in tangible applications for the laboratory, the best example of which being the C.E.R.A. assay, but also in scientific findings allowing to improve our comprehension of EPO doping in sport.