22 resultados para Data classification

em Consorci de Serveis Universitaris de Catalunya (CSUC), Spain


Relevância:

40.00% 40.00%

Publicador:

Resumo:

The classical binary classification problem is investigatedwhen it is known in advance that the posterior probability function(or regression function) belongs to some class of functions. We introduceand analyze a method which effectively exploits this knowledge. The methodis based on minimizing the empirical risk over a carefully selected``skeleton'' of the class of regression functions. The skeleton is acovering of the class based on a data--dependent metric, especiallyfitted for classification. A new scale--sensitive dimension isintroduced which is more useful for the studied classification problemthan other, previously defined, dimension measures. This fact isdemonstrated by performance bounds for the skeleton estimate in termsof the new dimension.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We construct estimates of educational attainment for a sample of OECD countries using previously unexploited sources. We follow a heuristic approach to obtain plausible time profiles for attainment levels by removing sharp breaks in the data that seem to reflect changes in classification criteria. We then construct indicators of the information content of our series and a number of previously available data sets and examine their performance in several growth specifications. We find a clear positive correlation between data quality and the size and significance of human capital coefficients in growth regressions. Using an extension of the classical errors in variables model, we construct a set of meta-estimates of the coefficient of years of schooling in an aggregate Cobb-Douglas production function. Our results suggest that, after correcting for measurement error bias, the value of this parameter is well above 0.50.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Landscape classification tackles issues related to the representation and analysis of continuous and variable ecological data. In this study, a methodology is created in order to define topo-climatic landscapes (TCL) in the north-west of Catalonia (north-east of the Iberian Peninsula). TCLs relate the ecological behaviour of a landscape in terms of topography, physiognomy and climate, which compound the main drivers of an ecosystem. Selected variables are derived from different sources such as remote sensing and climatic atlas. The proposed methodology combines unsupervised interative cluster classification with a supervised fuzzy classification. As a result, 28 TCLs have been found for the study area which may be differentiated in terms of vegetation physiognomy and vegetation altitudinal range type. Furthermore a hierarchy among TCLs is set, enabling the merging of clusters and allowing for changes of scale. Through the topo-climatic landscape map, managers may identify patches with similar environmental conditions and asses at the same time the uncertainty involved.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Land cover classification is a key research field in remote sensing and land change science as thematic maps derived from remotely sensed data have become the basis for analyzing many socio-ecological issues. However, land cover classification remains a difficult task and it is especially challenging in heterogeneous tropical landscapes where nonetheless such maps are of great importance. The present study aims to establish an efficient classification approach to accurately map all broad land cover classes in a large, heterogeneous tropical area of Bolivia, as a basis for further studies (e.g., land cover-land use change). Specifically, we compare the performance of parametric (maximum likelihood), non-parametric (k-nearest neighbour and four different support vector machines - SVM), and hybrid classifiers, using both hard and soft (fuzzy) accuracy assessments. In addition, we test whether the inclusion of a textural index (homogeneity) in the classifications improves their performance. We classified Landsat imagery for two dates corresponding to dry and wet seasons and found that non-parametric, and particularly SVM classifiers, outperformed both parametric and hybrid classifiers. We also found that the use of the homogeneity index along with reflectance bands significantly increased the overall accuracy of all the classifications, but particularly of SVM algorithms. We observed that improvements in producer’s and user’s accuracies through the inclusion of the homogeneity index were different depending on land cover classes. Earlygrowth/degraded forests, pastures, grasslands and savanna were the classes most improved, especially with the SVM radial basis function and SVM sigmoid classifiers, though with both classifiers all land cover classes were mapped with producer’s and user’s accuracies of around 90%. Our approach seems very well suited to accurately map land cover in tropical regions, thus having the potential to contribute to conservation initiatives, climate change mitigation schemes such as REDD+, and rural development policies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We use historical data that cover more than one century on real GDP for industrial countries and employ the Pesaran panel unit root test that allows for cross-sectional dependence to test for a unit root on real GDP. We find strong evidence against the unit root null. Our results are robust to the chosen group of countries and the sample period. Key words: real GDP stationarity, cross-sectional dependence, CIPS test. JEL Classification: C23, E32

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We investigate whether dimensionality reduction using a latent generative model is beneficial for the task of weakly supervised scene classification. In detail, we are given a set of labeled images of scenes (for example, coast, forest, city, river, etc.), and our objective is to classify a new image into one of these categories. Our approach consists of first discovering latent ";topics"; using probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature here applied to a bag of visual words representation for each image, and subsequently, training a multiway classifier on the topic distribution vector for each image. We compare this approach to that of representing each image by a bag of visual words vector directly and training a multiway classifier on these vectors. To this end, we introduce a novel vocabulary using dense color SIFT descriptors and then investigate the classification performance under changes in the size of the visual vocabulary, the number of latent topics learned, and the type of discriminative classifier used (k-nearest neighbor or SVM). We achieve superior classification performance to recent publications that have used a bag of visual word representation, in all cases, using the authors' own data sets and testing protocols. We also investigate the gain in adding spatial information. We show applications to image retrieval with relevance feedback and to scene classification in videos

Relevância:

30.00% 30.00%

Publicador:

Resumo:

It has been shown that the accuracy of mammographic abnormality detection methods is strongly dependent on the breast tissue characteristics, where a dense breast drastically reduces detection sensitivity. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. Here, we describe the development of an automatic breast tissue classification methodology, which can be summarized in a number of distinct steps: 1) the segmentation of the breast area into fatty versus dense mammographic tissue; 2) the extraction of morphological and texture features from the segmented breast areas; and 3) the use of a Bayesian combination of a number of classifiers. The evaluation, based on a large number of cases from two different mammographic data sets, shows a strong correlation ( and 0.67 for the two data sets) between automatic and expert-based Breast Imaging Reporting and Data System mammographic density assessment

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The classification of Art painting images is a computer vision applications that isgrowing considerably. The goal of this technology, is to classify an art paintingimage automatically, in terms of artistic style, technique used, or its author. For thispurpose, the image is analyzed extracting some visual features. Many articlesrelated with these problems have been issued, but in general the proposed solutionsare focused in a very specific field. In particular, algorithms are tested using imagesat different resolutions, acquired under different illumination conditions. Thatmakes complicate the performance comparison of the different methods. In thiscontext, it will be very interesting to construct a public art image database, in orderto compare all the existing algorithms under the same conditions. This paperpresents a large art image database, with their corresponding labels according to thefollowing characteristics: title, author, style and technique. Furthermore, a tool thatmanages this database have been developed, and it can be used to extract differentvisual features for any selected image. This data can be exported to a file in CSVformat, allowing researchers to analyze the data with other tools. During the datacollection, the tool stores the elapsed time in the calculation. Thus, this tool alsoallows to compare the efficiency, in computation time, of different mathematicalprocedures for extracting image data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Automatic classification of makams from symbolic data is a rarely studied topic. In this paper, first a review of an n-gram based approach is presented using various representations of the symbolic data. While a high degree of precision can be obtained, confusion happens mainly for makams using (almost) the same scale and pitch hierarchy but differ in overall melodic progression, seyir. To further improve the system, first n-gram based classification is tested for various sections of the piece to take into account a feature of the seyir that melodic progression starts in a certain region of the scale. In a second test, a hierarchical classification structure is designed which uses n-grams and seyir features in different levels to further improve the system.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The absolute K magnitudes and kinematic parameters of about 350 oxygen-rich Long-Period Variable stars are calibrated, by means of an up-to-date maximum-likelihood method, using HIPPARCOS parallaxes and proper motions together with radial velocities and, as additional data, periods and V-K colour indices. Four groups, differing by their kinematics and mean magnitudes, are found. For each of them, we also obtain the distributions of magnitude, period and de-reddened colour of the base population, as well as de-biased period-luminosity-colour relations and their two-dimensional projections. The SRa semiregulars do not seem to constitute a separate class of LPVs. The SRb appear to belong to two populations of different ages. In a PL diagram, they constitute two evolutionary sequences towards the Mira stage. The Miras of the disk appear to pulsate on a lower-order mode. The slopes of their de-biased PL and PC relations are found to be very different from the ones of the Oxygen Miras of the LMC. This suggests that a significant number of so-called Miras of the LMC are misclassified. This also suggests that the Miras of the LMC do not constitute a homogeneous group, but include a significant proportion of metal-deficient stars, suggesting a relatively smooth star formation history. As a consequence, one may not trivially transpose the LMC period-luminosity relation from one galaxy to the other.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During the period 1996-2000, forty-three heavy rainfall events have been detected in the Internal Basins of Catalonia (Northeastern of Spain). Most of these events caused floods and serious damage. This high number leads to the need for a methodology to classify them, on the basis of their surface rainfall distribution, their internal organization and their physical features. The aim of this paper is to show a methodology to analyze systematically the convective structures responsible of those heavy rainfall events on the basis of the information supplied by the meteorological radar. The proposed methodology is as follows. Firstly, the rainfall intensity and the surface rainfall pattern are analyzed on the basis of the raingauge data. Secondly, the convective structures at the lowest level are identified and characterized by using a 2-D algorithm, and the convective cells are identified by using a 3-D procedure that looks for the reflectivity cores in every radar volume. Thirdly, the convective cells (3-D) are associated with the 2-D structures (convective rainfall areas). This methodology has been applied to the 43 heavy rainfall events using the meteorological radar located near Barcelona and the SAIH automatic raingauge network.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Drift is an important issue that impairs the reliability of gas sensing systems. Sensor aging, memory effects and environmental disturbances produce shifts in sensor responses that make initial statistical models for gas or odor recognition useless after a relatively short period (typically few weeks). Frequent recalibrations are needed to preserve system accuracy. However, when recalibrations involve numerous samples they become expensive and laborious. An interesting and lower cost alternative is drift counteraction by signal processing techniques. Orthogonal Signal Correction (OSC) is proposed for drift compensation in chemical sensor arrays. The performance of OSC is also compared with Component Correction (CC). A simple classification algorithm has been employed for assessing the performance of the algorithms on a dataset composed by measurements of three analytes using an array of seventeen conductive polymer gas sensors over a ten month period.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A new drift compensation method based on Common Principal Component Analysis (CPCA) is proposed. The drift variance in data is found as the principal components computed by CPCA. This method finds components that are common for all gasses in feature space. The method is compared in classification task with respect to the other approaches published where the drift direction is estimated through a Principal Component Analysis (PCA) of a reference gas. The proposed new method ¿ employing no specific reference gas, but information from all gases ¿has shown the same performance as the traditional approach with the best-fitted reference gas. Results are shown with data lasting 7-months including three gases at different concentrations for an array of 17 polymeric sensors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biometric system performance can be improved by means of data fusion. Several kinds of information can be fused in order to obtain a more accurate classification (identification or verification) of an input sample. In this paper we present a method for computing the weights in a weighted sum fusion for score combinations, by means of a likelihood model. The maximum likelihood estimation is set as a linear programming problem. The scores are derived from a GMM classifier working on a different feature extractor. Our experimental results assesed the robustness of the system in front a changes on time (different sessions) and robustness in front a change of microphone. The improvements obtained were significantly better (error bars of two standard deviations) than a uniform weighted sum or a uniform weighted product or the best single classifier. The proposed method scales computationaly with the number of scores to be fussioned as the simplex method for linear programming.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The glasses of the rosette forming the main window of the transept of the Gothic Cathedral of Tarragona have been characterised by means of SEM/EDS, XRD, FTIR and electronic microprobe. The multivariate statistical treatment of these data allow to establish a classification of the samples forming groups having an historical significance and reflecting ancient restorations. Furthermore, the decay patterns and mechanisms have been determined and the weathering by-products characterised. It has been demonstrated a clear influence of the bioactivity in the decay of these glasses, which activity is partially controlled by the chemical composition of the glasses.