968 resultados para statistical classification
Classification of Paintings by Artist, Movement, and Indoor Setting Using MPEG-7 Descriptor Features
Resumo:
ACM Computing Classification System (1998): I.4.9, I.4.10.
Resumo:
2010 Mathematics Subject Classification: 60J80.
Resumo:
2000 Mathematics Subject Classification: 41A25, 41A36, 40G15.
Resumo:
2000 Mathematics Subject Classification: 41A25, 41A36.
Resumo:
A Bázel–2. tőkeegyezmény bevezetését követően a bankok és hitelintézetek Magyarországon is megkezdték saját belső minősítő rendszereik felépítését, melyek karbantartása és fejlesztése folyamatos feladat. A szerző arra a kérdésre keres választ, hogy lehetséges-e a csőd-előrejelző modellek előre jelző képességét növelni a hagyományos matematikai-statisztikai módszerek alkalmazásával oly módon, hogy a modellekbe a pénzügyi mutatószámok időbeli változásának mértékét is beépítjük. Az empirikus kutatási eredmények arra engednek következtetni, hogy a hazai vállalkozások pénzügyi mutatószámainak időbeli alakulása fontos információt hordoz a vállalkozás jövőbeli fizetőképességéről, mivel azok felhasználása jelentősen növeli a csődmodellek előre jelző képességét. A szerző azt is megvizsgálja, hogy javítja-e a megfigyelések szélsőségesen magas vagy alacsony értékeinek modellezés előtti korrekciója a modellek klasszifikációs teljesítményét. ______ Banks and lenders in Hungary also began, after the introduction of the Basel 2 capital agreement, to build up their internal rating systems, whose maintenance and development are a continuing task. The author explores whether it is possible to increase the predictive capacity of business-failure forecasting models by traditional mathematical-cum-statistical means in such a way that they incorporate the measure of change in the financial indicators over time. Empirical findings suggest that the temporal development of the financial indicators of firms in Hungary carries important information about future ability to pay, since the predictive capacity of bankruptcy forecasting models is greatly increased by using such indicators. The author also examines whether the classification performance of the models can be improved by correcting for extremely high or low values before modelling.
Resumo:
The microarray technology provides a high-throughput technique to study gene expression. Microarrays can help us diagnose different types of cancers, understand biological processes, assess host responses to drugs and pathogens, find markers for specific diseases, and much more. Microarray experiments generate large amounts of data. Thus, effective data processing and analysis are critical for making reliable inferences from the data. ^ The first part of dissertation addresses the problem of finding an optimal set of genes (biomarkers) to classify a set of samples as diseased or normal. Three statistical gene selection methods (GS, GS-NR, and GS-PCA) were developed to identify a set of genes that best differentiate between samples. A comparative study on different classification tools was performed and the best combinations of gene selection and classifiers for multi-class cancer classification were identified. For most of the benchmarking cancer data sets, the gene selection method proposed in this dissertation, GS, outperformed other gene selection methods. The classifiers based on Random Forests, neural network ensembles, and K-nearest neighbor (KNN) showed consistently god performance. A striking commonality among these classifiers is that they all use a committee-based approach, suggesting that ensemble classification methods are superior. ^ The same biological problem may be studied at different research labs and/or performed using different lab protocols or samples. In such situations, it is important to combine results from these efforts. The second part of the dissertation addresses the problem of pooling the results from different independent experiments to obtain improved results. Four statistical pooling techniques (Fisher inverse chi-square method, Logit method. Stouffer's Z transform method, and Liptak-Stouffer weighted Z-method) were investigated in this dissertation. These pooling techniques were applied to the problem of identifying cell cycle-regulated genes in two different yeast species. As a result, improved sets of cell cycle-regulated genes were identified. The last part of dissertation explores the effectiveness of wavelet data transforms for the task of clustering. Discrete wavelet transforms, with an appropriate choice of wavelet bases, were shown to be effective in producing clusters that were biologically more meaningful. ^
Resumo:
Flow Cytometry analyzers have become trusted companions due to their ability to perform fast and accurate analyses of human blood. The aim of these analyses is to determine the possible existence of abnormalities in the blood that have been correlated with serious disease states, such as infectious mononucleosis, leukemia, and various cancers. Though these analyzers provide important feedback, it is always desired to improve the accuracy of the results. This is evidenced by the occurrences of misclassifications reported by some users of these devices. It is advantageous to provide a pattern interpretation framework that is able to provide better classification ability than is currently available. Toward this end, the purpose of this dissertation was to establish a feature extraction and pattern classification framework capable of providing improved accuracy for detecting specific hematological abnormalities in flow cytometric blood data. ^ This involved extracting a unique and powerful set of shift-invariant statistical features from the multi-dimensional flow cytometry data and then using these features as inputs to a pattern classification engine composed of an artificial neural network (ANN). The contribution of this method consisted of developing a descriptor matrix that can be used to reliably assess if a donor’s blood pattern exhibits a clinically abnormal level of variant lymphocytes, which are blood cells that are potentially indicative of disorders such as leukemia and infectious mononucleosis. ^ This study showed that the set of shift-and-rotation-invariant statistical features extracted from the eigensystem of the flow cytometric data pattern performs better than other commonly-used features in this type of disease detection, exhibiting an accuracy of 80.7%, a sensitivity of 72.3%, and a specificity of 89.2%. This performance represents a major improvement for this type of hematological classifier, which has historically been plagued by poor performance, with accuracies as low as 60% in some cases. This research ultimately shows that an improved feature space was developed that can deliver improved performance for the detection of variant lymphocytes in human blood, thus providing significant utility in the realm of suspect flagging algorithms for the detection of blood-related diseases.^
Resumo:
This thesis introduces two related lines of study on classification of hyperspectral images with nonlinear methods. First, it describes a quantitative and systematic evaluation, by the author, of each major component in a pipeline for classifying hyperspectral images (HSI) developed earlier in a joint collaboration [23]. The pipeline, with novel use of nonlinear classification methods, has reached beyond the state of the art in classification accuracy on commonly used benchmarking HSI data [6], [13]. More importantly, it provides a clutter map, with respect to a predetermined set of classes, toward the real application situations where the image pixels not necessarily fall into a predetermined set of classes to be identified, detected or classified with.
The particular components evaluated are a) band selection with band-wise entropy spread, b) feature transformation with spatial filters and spectral expansion with derivatives c) graph spectral transformation via locally linear embedding for dimension reduction, and d) statistical ensemble for clutter detection. The quantitative evaluation of the pipeline verifies that these components are indispensable to high-accuracy classification.
Secondly, the work extends the HSI classification pipeline with a single HSI data cube to multiple HSI data cubes. Each cube, with feature variation, is to be classified of multiple classes. The main challenge is deriving the cube-wise classification from pixel-wise classification. The thesis presents the initial attempt to circumvent it, and discuss the potential for further improvement.
Resumo:
Ignoring small-scale heterogeneities in Arctic land cover may bias estimates of water, heat and carbon fluxes in large-scale climate and ecosystem models. We investigated subpixel-scale heterogeneity in CHRIS/PROBA and Landsat-7 ETM+ satellite imagery over ice-wedge polygonal tundra in the Lena Delta of Siberia, and the associated implications for evapotranspiration (ET) estimation. Field measurements were combined with aerial and satellite data to link fine-scale (0.3 m resolution) with coarse-scale (upto 30 m resolution) land cover data. A large portion of the total wet tundra (80%) and water body area (30%) appeared in the form of patches less than 0.1 ha in size, which could not be resolved with satellite data. Wet tundra and small water bodies represented about half of the total ET in summer. Their contribution was reduced to 20% in fall, during which ET rates from dry tundra were highest instead. Inclusion of subpixel-scale water bodies increased the total water surface area of the Lena Delta from 13% to 20%. The actual land/water proportions within each composite satellite pixel was best captured with Landsat data using a statistical downscaling approach, which is recommended for reliable large-scale modelling of water, heat and carbon exchange from permafrost landscapes.
Resumo:
The main objective of this work was to develop a novel dimensionality reduction technique as a part of an integrated pattern recognition solution capable of identifying adulterants such as hazelnut oil in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. A novel Continuous Locality Preserving Projections (CLPP) technique is proposed which allows the modelling of the continuous nature of the produced in-house admixtures as data series instead of discrete points. The maintenance of the continuous structure of the data manifold enables the better visualisation of this examined classification problem and facilitates the more accurate utilisation of the manifold for detecting the adulterants. The performance of the proposed technique is validated with two different spectroscopic techniques (Raman and Fourier transform infrared, FT-IR). In all cases studied, CLPP accompanied by k-Nearest Neighbors (kNN) algorithm was found to outperform any other state-of-the-art pattern recognition techniques.
Resumo:
The ocean and its resources are increasingly seen as indispensable in addressing the multiple challenges the planet is facing in the decades to come. It has never been easy to quantify this particular sector of the economy, in any country, given the lack of a detailed, centralized data base with adequate specifics covering the necessary sectors, this article aims to compare the existing ocean economy statistical systems, especially Asia-Pacific, American and European countries, in order to overcome the deficiencies with regard to the diversity of definitions and statistical representations of ocean sectors, establish the standard statistical system and compile data for the global ocean economy.
Resumo:
Résumé : Face à l’accroissement de la résolution spatiale des capteurs optiques satellitaires, de nouvelles stratégies doivent être développées pour classifier les images de télédétection. En effet, l’abondance de détails dans ces images diminue fortement l’efficacité des classifications spectrales; de nombreuses méthodes de classification texturale, notamment les approches statistiques, ne sont plus adaptées. À l’inverse, les approches structurelles offrent une ouverture intéressante : ces approches orientées objet consistent à étudier la structure de l’image pour en interpréter le sens. Un algorithme de ce type est proposé dans la première partie de cette thèse. Reposant sur la détection et l’analyse de points-clés (KPC : KeyPoint-based Classification), il offre une solution efficace au problème de la classification d’images à très haute résolution spatiale. Les classifications effectuées sur les données montrent en particulier sa capacité à différencier des textures visuellement similaires. Par ailleurs, il a été montré dans la littérature que la fusion évidentielle, reposant sur la théorie de Dempster-Shafer, est tout à fait adaptée aux images de télédétection en raison de son aptitude à intégrer des concepts tels que l’ambiguïté et l’incertitude. Peu d’études ont en revanche été menées sur l’application de cette théorie à des données texturales complexes telles que celles issues de classifications structurelles. La seconde partie de cette thèse vise à combler ce manque, en s’intéressant à la fusion de classifications KPC multi-échelle par la théorie de Dempster-Shafer. Les tests menés montrent que cette approche multi-échelle permet d’améliorer la classification finale dans le cas où l’image initiale est de faible qualité. De plus, l’étude effectuée met en évidence le potentiel d’amélioration apporté par l’estimation de la fiabilité des classifications intermédiaires, et fournit des pistes pour mener ces estimations.
Resumo:
In order to optimize frontal detection in sea surface temperature fields at 4 km resolution, a combined statistical and expert-based approach is applied to test different spatial smoothing of the data prior to the detection process. Fronts are usually detected at 1 km resolution using the histogram-based, single image edge detection (SIED) algorithm developed by Cayula and Cornillon in 1992, with a standard preliminary smoothing using a median filter and a 3 × 3 pixel kernel. Here, detections are performed in three study regions (off Morocco, the Mozambique Channel, and north-western Australia) and across the Indian Ocean basin using the combination of multiple windows (CMW) method developed by Nieto, Demarcq and McClatchie in 2012 which improves on the original Cayula and Cornillon algorithm. Detections at 4 km and 1 km of resolution are compared. Fronts are divided in two intensity classes (“weak” and “strong”) according to their thermal gradient. A preliminary smoothing is applied prior to the detection using different convolutions: three type of filters (median, average and Gaussian) combined with four kernel sizes (3 × 3, 5 × 5, 7 × 7, and 9 × 9 pixels) and three detection window sizes (16 × 16, 24 × 24 and 32 × 32 pixels) to test the effect of these smoothing combinations on reducing the background noise of the data and therefore on improving the frontal detection. The performance of the combinations on 4 km data are evaluated using two criteria: detection efficiency and front length. We find that the optimal combination of preliminary smoothing parameters in enhancing detection efficiency and preserving front length includes a median filter, a 16 × 16 pixel window size, and a 5 × 5 pixel kernel for strong fronts and a 7 × 7 pixel kernel for weak fronts. Results show an improvement in detection performance (from largest to smallest window size) of 71% for strong fronts and 120% for weak fronts. Despite the small window used (16 × 16 pixels), the length of the fronts has been preserved relative to that found with 1 km data. This optimal preliminary smoothing and the CMW detection algorithm on 4 km sea surface temperature data are then used to describe the spatial distribution of the monthly frequencies of occurrence for both strong and weak fronts across the Indian Ocean basin. In general strong fronts are observed in coastal areas whereas weak fronts, with some seasonal exceptions, are mainly located in the open ocean. This study shows that adequate noise reduction done by a preliminary smoothing of the data considerably improves the frontal detection efficiency as well as the global quality of the results. Consequently, the use of 4 km data enables frontal detections similar to 1 km data (using a standard median 3 × 3 convolution) in terms of detectability, length and location. This method, using 4 km data is easily applicable to large regions or at the global scale with far less constraints of data manipulation and processing time relative to 1 km data.
Resumo:
Within the classification of orbits in axisymmetric stellar systems, we present a new algorithm able to automatically classify the orbits according to their nature. The algorithm involves the application of the correlation integral method to the surface of section of the orbit; fitting the cumulative distribution function built with the consequents in the surface of section of the orbit, we can obtain the value of its logarithmic slope m which is directly related to the orbit’s nature: for slopes m ≈ 1 we expect the orbit to be regular, for slopes m ≈ 2 we expect it to be chaotic. With this method we have a fast and reliable way to classify orbits and, furthermore, we provide an analytical expression of the probability that an orbit is regular or chaotic given the logarithmic slope m of its correlation integral. Although this method works statistically well, the underlying algorithm can fail in some cases, misclassifying individual orbits under some peculiar circumstances. The performance of the algorithm benefits from a rich sampling of the traces of the SoS, which can be obtained with long numerical integration of orbits. Finally we note that the algorithm does not differentiate between the subtypes of regular orbits: resonantly trapped and untrapped orbits. Such distinction would be a useful feature, which we leave for future work. Since the result of the analysis is a probability linked to a Gaussian distribution, for the very definition of distribution, some orbits even if they have a certain nature are classified as belonging to the opposite class and create the probabilistic tails of the distribution. So while the method produces fair statistical results, it lacks in absolute classification precision.
Resumo:
Ochnaceae s.str. (Malpighiales) are a pantropical family of about 500 species and 27 genera of almost exclusively woody plants. Infrafamilial classification and relationships have been controversial partially due to the lack of a robust phylogenetic framework. Including all genera except Indosinia and Perissocarpa and DNA sequence data for five DNA regions (ITS, matK, ndhF, rbcL, trnL-F), we provide for the first time a nearly complete molecular phylogenetic analysis of Ochnaceae s.l. resolving most of the phylogenetic backbone of the family. Based on this, we present a new classification of Ochnaceae s.l., with Medusagynoideae and Quiinoideae included as subfamilies and the former subfamilies Ochnoideae and Sauvagesioideae recognized at the rank of tribe. Our data support a monophyletic Ochneae, but Sauvagesieae in the traditional circumscription is paraphyletic because Testulea emerges as sister to the rest of Ochnoideae, and the next clade shows Luxemburgia+Philacra as sister group to the remaining Ochnoideae. To avoid paraphyly, we classify Luxemburgieae and Testuleeae as new tribes. The African genus Lophira, which has switched between subfamilies (here tribes) in past classifications, emerges as sister to all other Ochneae. Thus, endosperm-free seeds and ovules with partly to completely united integuments (resulting in an apparently single integument) are characters that unite all members of that tribe. The relationships within its largest clade, Ochnineae (former Ochneae), are poorly resolved, but former Ochninae (Brackenridgea, Ochna) are polyphyletic. Within Sauvagesieae, the genus Sauvagesia in its broad circumscription is polyphyletic as Sauvagesia serrata is sister to a clade of Adenarake, Sauvagesia spp., and three other genera. Within Quiinoideae, in contrast to former phylogenetic hypotheses, Lacunaria and Touroulia form a clade that is sister to Quiina. Bayesian ancestral state reconstructions showed that zygomorphic flowers with adaptations to buzz-pollination (poricidal anthers), a syncarpous gynoecium (a near-apocarpous gynoecium evolved independently in Quiinoideae and Ochninae), numerous ovules, septicidal capsules, and winged seeds with endosperm are the ancestral condition in Ochnoideae. Although in some lineages poricidal anthers were lost secondarily, the evolution of poricidal superstructures secured the maintenance of buzz-pollination in some of these genera, indicating a strong selective pressure on keeping that specialized pollination system.