981 resultados para Unsupervised document classification


Relevância:

20.00% 20.00%

Publicador:

Resumo:

An unsupervised approach to image segmentation which fuses region and boundary information is presented. The proposed approach takes advantage of the combined use of 3 different strategies: the guidance of seed placement, the control of decision criterion, and the boundary refinement. The new algorithm uses the boundary information to initialize a set of active regions which compete for the pixels in order to segment the whole image. The method is implemented on a multiresolution representation which ensures noise robustness as well as computation efficiency. The accuracy of the segmentation results has been proven through an objective comparative evaluation of the method

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A new approach to mammographic mass detection is presented in this paper. Although different algorithms have been proposed for such a task, most of them are application dependent. In contrast, our approach makes use of a kindred topic in computer vision adapted to our particular problem. In this sense, we translate the eigenfaces approach for face detection/classification problems to a mass detection. Two different databases were used to show the robustness of the approach. The first one consisted on a set of 160 regions of interest (RoIs) extracted from the MIAS database, being 40 of them with confirmed masses and the rest normal tissue. The second set of RoIs was extracted from the DDSM database, and contained 196 RoIs containing masses and 392 with normal, but suspicious regions. Initial results demonstrate the feasibility of using such approach with performances comparable to other algorithms, with the advantage of being a more general, simple and cost-effective approach

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We propose a probabilistic object classifier for outdoor scene analysis as a first step in solving the problem of scene context generation. The method begins with a top-down control, which uses the previously learned models (appearance and absolute location) to obtain an initial pixel-level classification. This information provides us the core of objects, which is used to acquire a more accurate object model. Therefore, their growing by specific active regions allows us to obtain an accurate recognition of known regions. Next, a stage of general segmentation provides the segmentation of unknown regions by a bottom-strategy. Finally, the last stage tries to perform a region fusion of known and unknown segmented objects. The result is both a segmentation of the image and a recognition of each segment as a given object class or as an unknown segmented object. Furthermore, experimental results are shown and evaluated to prove the validity of our proposal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

We present a new approach to model and classify breast parenchymal tissue. Given a mammogram, first, we will discover the distribution of the different tissue densities in an unsupervised manner, and second, we will use this tissue distribution to perform the classification. We achieve this using a classifier based on local descriptors and probabilistic Latent Semantic Analysis (pLSA), a generative model from the statistical text literature. We studied the influence of different descriptors like texture and SIFT features at the classification stage showing that textons outperform SIFT in all cases. Moreover we demonstrate that pLSA automatically extracts meaningful latent aspects generating a compact tissue representation based on their densities, useful for discriminating on mammogram classification. We show the results of tissue classification over the MIAS and DDSM datasets. We compare our method with approaches that classified these same datasets showing a better performance of our proposal

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this report is to classify analytical methods based on flowing media and to define (standardize) terminology. After the classification and a discussion of terms describing the systems and component parts, a section is devoted to terms describing the performance of flow systems. The list of terms included is restricted to the most relevant ones; especially "self-explanatory" terms are left out. It is emphasised that the usage of terms or expressions that do not adequately describe the processes or procedures involved should be strongly discouraged. Although belonging to the category of methods based on flowing media, chromatographic methods are not comprised in the present document. However, care has been taken that the present text is not in conflict with definitions in that domain. In documents in which flow methods are described, it should be clearly indicated how the sample and/or reagent is introduced and how the sample zone is transported. When introducing new techniques in the field, or variants of existing techniques, it is strongly recommended that descriptive terms rather than trivial or elaborate names are used.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Arterial stiffness assessed by carotid-femoral pulse wave velocity (cfPWV) measurement is now well accepted as an independent predictor of vascular mortality and morbidity. However, the value of cfPWV has been considered to be limited for risk classification in patients with several vascular risk factors. Magnetic resonance (MR) allows measurement of PWV between two points, though to date mainly used to study the aorta. To assess the common carotid artery pulse wave velocity by magnetic resonance, determine their association with classical vascular risk factors and ischemic brain injury burden in patients with suspected ischemic cerebrovascular disease

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Breast cancer is the most common diagnosed cancer and the leading cause of cancer death among females worldwide. It is considered a highly heterogeneous disease and it must be classified into more homogeneous groups. Hence, the purpose of this study was to classify breast tumors based on variations in gene expression patterns derived from RNA sequencing by using different class discovery methods. 42 breast tumors paired-samples were sequenced by Illumine Genome Analyzer and the data was analyzed and prepared by TopHat2 and htseq-count. As reported previously, breast cancer could be grouped into five main groups known as basal epithelial-like group, HER2 group, normal breast-like group and two Luminal groups with a distinctive expression profile. Classifying breast tumor samples by using PAM50 method, the most common subtype was Luminal B and was significantly associated with ESR1 and ERBB2 high expression. Luminal A subtype had ESR1 and SLC39A6 significant high expression, whereas HER2 subtype had a high expression of ERBB2 and CNNE1 genes and low luminal epithelial gene expression. Basal-like and normal-like subtypes were associated with low expression of ESR1, PgR and HER2, and had significant high expression of cytokeratins 5 and 17. Our results were similar compared with TGCA breast cancer data results and with known studies related with breast cancer classification. Classifying breast tumors could add significant prognostic and predictive information to standard parameters, and moreover, identify marker genes for each subtype to find a better therapy for patients with breast cancer.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The number of digital images has been increasing exponentially in the last few years. People have problems managing their image collections and finding a specific image. An automatic image categorization system could help them to manage images and find specific images. In this thesis, an unsupervised visual object categorization system was implemented to categorize a set of unknown images. The system is unsupervised, and hence, it does not need known images to train the system which needs to be manually obtained. Therefore, the number of possible categories and images can be huge. The system implemented in the thesis extracts local features from the images. These local features are used to build a codebook. The local features and the codebook are then used to generate a feature vector for an image. Images are categorized based on the feature vectors. The system is able to categorize any given set of images based on the visual appearance of the images. Images that have similar image regions are grouped together in the same category. Thus, for example, images which contain cars are assigned to the same cluster. The unsupervised visual object categorization system can be used in many situations, e.g., in an Internet search engine. The system can categorize images for a user, and the user can then easily find a specific type of image.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In the last two decades, cases of corruption have been unveiled in different countries, raising public awareness and reinforcing a trend in which society expects more from their leaders. Our objective in this paper is to examine the effects of corruption and seigniorage on inflation and growth rates. The model used in this article is an extension of the model used by Huang and Wei (2006). We find interesting results and one of them is that, under some conditions, corruption has a positive impact on the growth rate. JEL classification : D73, E52, E58, E62. Keywords : Corruption; Fiscal Policy; Growth; Monetary Policy; Seigniorage.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A class of three-sided markets (and games) is considered, where value is generated by pairs or triplets of agents belonging to different sectors, as well as by individuals. For these markets we analyze the situation that arises when some agents leave the market with some payoff To this end, we introduce the derived market (and game) and relate it to the Davis and Maschler (1965) reduced game. Consistency with respect to the derived market, together with singleness best and individual anti-monotonicity axiomatically characterize the core for these generalized three-sided assignment markets. These markets may have an empty core, but we define a balanced subclass, where the worth of each triplet is defined as the addition of the worths of the pairs it contains. Keywords: Multi-sided assignment market, Consistency, Core, Nucleolus. JEL Classification: C71, C78

Relevância:

20.00% 20.00%

Publicador:

Resumo:

El presente proyecto tiene como objetivo desarrollar una tecnología que permita codificar grandes cantidades de texto de manera automática para posteriormente ser visualizada y analizada mediante una aplicación diseñada en Qlikview. El motor de la investigación e implementación de este proyecto se ha encontrado en la incipiente presencia de tecnologías informáticas en los procesos de codificación para ciencias políticas. De esta manera, el programa creado tiene como objetivo automatizar un proceso que se desarrolla comúnmente de manera manual y, por ende, las ventajas de introducir técnicas informáticas son notablemente valiosas. Estas automatizaciones permiten ahorrar tanto en tiempo de codificación, como en recursos económicos o humanos. Se ha elaborado una revisión teórica y metodológica que han servido como instrumentos de estudio y mejora, con el firme propósito de reducir al máximo el margen de error y ofrecer un instrumento de calidad con salida de mercado real. El método de clasificación utilizado ha sido Bayes, y se ha implementado utilizando Matlab. Los resultados de la clasificación han llegado a índices del 99.2%. En la visualización y análisis mediante Qlikview se pueden modificar los parámetros referentes a partido político, año, categoría o región, con lo que se permite analizar numerosos aspectos relacionados con la distribución de las palabras repartidas entre las diferentes categorías y en el tiempo.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

During the last two decades, skill mismatches have become one of the most important issues of policy concern in the EU (European Commission, 2008). Hence, the literature has stressed the necessity to reduce skill mismatches. We contribute to this literature by analyzing the impact of the transition from salaried employment to self-employment on self-reported skill mismatches. To do so, we resort to the European Community Household Panel (ECHP) covering the period 1994–2001. Using panel data, we track individuals over time and measure their self-reported skill mismatch before and after the transition. Our empirical findings indicate not only that the average self-employee is less likely to declare being skill-mismatched but also that those individuals who transit from salaried employment to self-employment reduce their probability of skill mismatches after the transition. Keywords: Self-employment, skill mismatches, salaried employment. JEL Classification: L26, J24, B23 __________________________

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

During the first decade of this century, Spain experienced the most important economic and housing boom in its recent history. This situation led the lending industry to dramatically expand through the mortgage market. The high competition among lenders caused a dramatic lowering of credit standards. During this period, lenders operating in the Spanish mortgage market artificially inflated appraised home values in order to draw larger mortgages. By doing this, lenders gave financially constrained households access to mortgage credit. In this paper, we analyze this phenomenon for this first time. To do so, we resort to a unique dataset of matched mortgage-dwelling-borrower characteristics covering the period 2004–2010. Our data allow us to construct an unbiased measure of property’s over-appraisal, since transaction prices in our data also includes any potential side payment in the transactions. Our findings indicate that i) in Spain, appraised home values were inflated on average by around 30% with respect to transaction prices; ii) creditconstrained households were more likely to be involved in mortgages with inflated house values; and iii) a regional indicator of competition in the lending market suggests that inflated appraisal values were also more likely to appear in more competitive regional mortgage markets. Keywords: Housing demand, appraisal values, house prices, housing bubble, credit constraints, mortgage market. JEL Classification: R21, R31

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This paper examines the extent to which innovative Spanish firms pursue improvements in energy efficiency (EE) as an objective of innovation. The increase in energy consumption and its impact on greenhouse gas emissions justifies the greater attention being paid to energy efficiency and especially to industrial EE. The ability of manufacturing companies to innovate and improve their EE has a substantial influence on attaining objectives regarding climate change mitigation. Despite the effort to design more efficient energy policies, the EE determinants in manufacturing firms have been little studied in the empirical literature. From an exhaustive sample of Spanish manufacturing firms and using a logit model, we examine the energy efficiency determinants for those firms that have innovated. To carry out the econometric analysis, we use panel data from the Community Innovation Survey for the period 2008‐2011. Our empirical results underline the role of size among the characteristics of firms that facilitate energy efficiency innovation. Regarding company behaviour, firms that consider the reduction of environmental impacts to be an important objective of innovation and that have introduced organisational innovations are more likely to innovate with the objective of increasing energy efficiency. Keywords: energy efficiency, corporate targets, innovation, Community Innovation Survey. JEL Classification: Q40, Q55, O31