Biblioteca Digital

981 resultados para Unsupervised document classification

Protected areas in the European Commuity. An approach to a common classification. ENV/311/80

Relevância:

30.00% 30.00%

Publicador:

Veja mais

Land-cover classification from airborne LIDAR data fused with aerial optical images

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Airborne LIght Detection And Ranging (LIDAR) provides accurate height information for objects on the earth, which makes LIDAR become more and more popular in terrain and land surveying. In particular, LIDAR data offer vital and significant features for land-cover classification which is an important task in many application domains. In this paper, an unsupervised approach based on an improved fuzzy Markov random field (FMRF) model is developed, by which the LIDAR data, its co-registered images acquired by optical sensors, i.e. aerial color image and near infrared image, and other derived features are fused effectively to improve the ability of the LIDAR system for the accurate land-cover classification. In the proposed FMRF model-based approach, the spatial contextual information is applied by modeling the image as a Markov random field (MRF), with which the fuzzy logic is introduced simultaneously to reduce the errors caused by the hard classification. Moreover, a Lagrange-Multiplier (LM) algorithm is employed to calculate a maximum A posteriori (MAP) estimate for the classification. The experimental results have proved that fusing the height data and optical images is particularly suited for the land-cover classification. The proposed approach works very well for the classification from airborne LIDAR data fused with its coregistered optical images and the average accuracy is improved to 88.9%.

Veja mais

MACHINE VISION FOR AUTOMATICVISUAL INSPECTION OF WOODENRAILWAY SLEEPERS USING UNSUPERVISED NEURAL NETWORKS

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The motivation for this thesis work is the need for improving reliability of equipment and quality of service to railway passengers as well as a requirement for cost-effective and efficient condition maintenance management for rail transportation. This thesis work develops a fusion of various machine vision analysis methods to achieve high performance in automation of wooden rail track inspection.The condition monitoring in rail transport is done manually by a human operator where people rely on inference systems and assumptions to develop conclusions. The use of conditional monitoring allows maintenance to be scheduled, or other actions to be taken to avoid the consequences of failure, before the failure occurs. Manual or automated condition monitoring of materials in fields of public transportation like railway, aerial navigation, traffic safety, etc, where safety is of prior importance needs non-destructive testing (NDT).In general, wooden railway sleeper inspection is done manually by a human operator, by moving along the rail sleeper and gathering information by visual and sound analysis for examining the presence of cracks. Human inspectors working on lines visually inspect wooden rails to judge the quality of rail sleeper. In this project work the machine vision system is developed based on the manual visual analysis system, which uses digital cameras and image processing software to perform similar manual inspections. As the manual inspection requires much effort and is expected to be error prone sometimes and also appears difficult to discriminate even for a human operator by the frequent changes in inspected material. The machine vision system developed classifies the condition of material by examining individual pixels of images, processing them and attempting to develop conclusions with the assistance of knowledge bases and features.A pattern recognition approach is developed based on the methodological knowledge from manual procedure. The pattern recognition approach for this thesis work was developed and achieved by a non destructive testing method to identify the flaws in manually done condition monitoring of sleepers.In this method, a test vehicle is designed to capture sleeper images similar to visual inspection by human operator and the raw data for pattern recognition approach is provided from the captured images of the wooden sleepers. The data from the NDT method were further processed and appropriate features were extracted.The collection of data by the NDT method is to achieve high accuracy in reliable classification results. A key idea is to use the non supervised classifier based on the features extracted from the method to discriminate the condition of wooden sleepers in to either good or bad. Self organising map is used as classifier for the wooden sleeper classification.In order to achieve greater integration, the data collected by the machine vision system was made to interface with one another by a strategy called fusion. Data fusion was looked in at two different levels namely sensor-level fusion, feature- level fusion. As the goal was to reduce the accuracy of the human error on the rail sleeper classification as good or bad the results obtained by the feature-level fusion compared to that of the results of actual classification were satisfactory.

Veja mais

Land use image classification through optimum-path forest clustering

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Land use classification has been paramount in the last years, since we can identify illegal land use and also to monitor deforesting areas. Although one can find several research works in the literature that address this problem, we propose here the land use recognition by means of Optimum-Path Forest Clustering (OPF), which has never been applied to this context up to date. Experiments among Optimum-Path Forest, Mean Shift and K-Means demonstrated the robustness of OPF for automatic land use classification of images obtained by CBERS-2B and Ikonos-2 satellites. © 2011 IEEE.

Veja mais

Geostatistics and remote sensing methods in the classification of images of areas cultivated with citrus

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Veja mais

Toward Satellite-Based Land Cover Classification Through Optimum-Path Forest

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Veja mais

Development of a new galaxy classification technique and its application to the zCOSMOS survey

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The purpose of this Thesis is to develop a robust and powerful method to classify galaxies from large surveys, in order to establish and confirm the connections between the principal observational parameters of the galaxies (spectral features, colours, morphological indices), and help unveil the evolution of these parameters from $z \sim 1$ to the local Universe. Within the framework of zCOSMOS-bright survey, and making use of its large database of objects ($\sim 10\,000$ galaxies in the redshift range $0 < z \lesssim 1.2$) and its great reliability in redshift and spectral properties determinations, first we adopt and extend the \emph{classification cube method}, as developed by Mignoli et al. (2009), to exploit the bimodal properties of galaxies (spectral, photometric and morphologic) separately, and then combining together these three subclassifications. We use this classification method as a test for a newly devised statistical classification, based on Principal Component Analysis and Unsupervised Fuzzy Partition clustering method (PCA+UFP), which is able to define the galaxy population exploiting their natural global bimodality, considering simultaneously up to 8 different properties. The PCA+UFP analysis is a very powerful and robust tool to probe the nature and the evolution of galaxies in a survey. It allows to define with less uncertainties the classification of galaxies, adding the flexibility to be adapted to different parameters: being a fuzzy classification it avoids the problems due to a hard classification, such as the classification cube presented in the first part of the article. The PCA+UFP method can be easily applied to different datasets: it does not rely on the nature of the data and for this reason it can be successfully employed with others observables (magnitudes, colours) or derived properties (masses, luminosities, SFRs, etc.). The agreement between the two classification cluster definitions is very high. ``Early'' and ``late'' type galaxies are well defined by the spectral, photometric and morphological properties, both considering them in a separate way and then combining the classifications (classification cube) and treating them as a whole (PCA+UFP cluster analysis). Differences arise in the definition of outliers: the classification cube is much more sensitive to single measurement errors or misclassifications in one property than the PCA+UFP cluster analysis, in which errors are ``averaged out'' during the process. This method allowed us to behold the \emph{downsizing} effect taking place in the PC spaces: the migration between the blue cloud towards the red clump happens at higher redshifts for galaxies of larger mass. The determination of $M_{\mathrm{cross}}$ the transition mass is in significant agreement with others values in literature.

Veja mais

New markov chain based methods for single and cross-domain sentiment classification

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nowadays communication is switching from a centralized scenario, where communication media like newspapers, radio, TV programs produce information and people are just consumers, to a completely different decentralized scenario, where everyone is potentially an information producer through the use of social networks, blogs, forums that allow a real-time worldwide information exchange. These new instruments, as a result of their widespread diffusion, have started playing an important socio-economic role. They are the most used communication media and, as a consequence, they constitute the main source of information enterprises, political parties and other organizations can rely on. Analyzing data stored in servers all over the world is feasible by means of Text Mining techniques like Sentiment Analysis, which aims to extract opinions from huge amount of unstructured texts. This could lead to determine, for instance, the user satisfaction degree about products, services, politicians and so on. In this context, this dissertation presents new Document Sentiment Classification methods based on the mathematical theory of Markov Chains. All these approaches bank on a Markov Chain based model, which is language independent and whose killing features are simplicity and generality, which make it interesting with respect to previous sophisticated techniques. Every discussed technique has been tested in both Single-Domain and Cross-Domain Sentiment Classification areas, comparing performance with those of other two previous works. The performed analysis shows that some of the examined algorithms produce results comparable with the best methods in literature, with reference to both single-domain and cross-domain tasks, in $2$-classes (i.e. positive and negative) Document Sentiment Classification. However, there is still room for improvement, because this work also shows the way to walk in order to enhance performance, that is, a good novel feature selection process would be enough to outperform the state of the art. Furthermore, since some of the proposed approaches show promising results in $2$-classes Single-Domain Sentiment Classification, another future work will regard validating these results also in tasks with more than $2$ classes.

Veja mais

Consensus Document of the International Union of Angiology (IUA)-2013. Current concept on the management of arterio-venous management

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Arterio-venous malformations (AVMs) are congenital vascular malformations (CVMs) that result from birth defects involving the vessels of both arterial and venous origins, resulting in direct communications between the different size vessels or a meshwork of primitive reticular networks of dysplastic minute vessels which have failed to mature to become 'capillary' vessels termed "nidus". These lesions are defined by shunting of high velocity, low resistance flow from the arterial vasculature into the venous system in a variety of fistulous conditions. A systematic classification system developed by various groups of experts (Hamburg classification, ISSVA classification, Schobinger classification, angiographic classification of AVMs,) has resulted in a better understanding of the biology and natural history of these lesions and improved management of CVMs and AVMs. The Hamburg classification, based on the embryological differentiation between extratruncular and truncular type of lesions, allows the determination of the potential of progression and recurrence of these lesions. The majority of all AVMs are extra-truncular lesions with persistent proliferative potential, whereas truncular AVM lesions are exceedingly rare. Regardless of the type, AV shunting may ultimately result in significant anatomical, pathophysiological and hemodynamic consequences. Therefore, despite their relative rarity (10-20% of all CVMs), AVMs remain the most challenging and potentially limb or life-threatening form of vascular anomalies. The initial diagnosis and assessment may be facilitated by non- to minimally invasive investigations such as duplex ultrasound, magnetic resonance imaging (MRI), MR angiography (MRA), computerized tomography (CT) and CT angiography (CTA). Arteriography remains the diagnostic gold standard, and is required for planning subsequent treatment. A multidisciplinary team approach should be utilized to integrate surgical and non-surgical interventions for optimum care. Currently available treatments are associated with significant risk of complications and morbidity. However, an early aggressive approach to elimiate the nidus (if present) may be undertaken if the benefits exceed the risks. Trans-arterial coil embolization or ligation of feeding arteries where the nidus is left intact, are incorrect approaches and may result in proliferation of the lesion. Furthermore, such procedures would prevent future endovascular access to the lesions via the arterial route. Surgically inaccessible, infiltrating, extra-truncular AVMs can be treated with endovascular therapy as an independent modality. Among various embolo-sclerotherapy agents, ethanol sclerotherapy produces the best long term outcomes with minimum recurrence. However, this procedure requires extensive training and sufficient experience to minimize complications and associated morbidity. For the surgically accessible lesions, surgical resection may be the treatment of choice with a chance of optimal control. Preoperative sclerotherapy or embolization may supplement the subsequent surgical excision by reducing the morbidity (e.g. operative bleeding) and defining the lesion borders. Such a combined approach may provide an excellent potential for a curative result. Conclusion. AVMs are high flow congenital vascular malformations that may occur in any part of the body. The clinical presentation depends on the extent and size of the lesion and can range from an asymptomatic birthmark to congestive heart failure. Detailed investigations including duplex ultrasound, MRI/MRA and CT/CTA are required to develop an appropriate treatment plan. Appropriate management is best achieved via a multi-disciplinary approach and interventions should be undertaken by appropriately trained physicians.

Veja mais

Diagnosis and Treatment of Venous Malformations Consensus Document of the International Union of Phlebology (IUP): updated 2013

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Venous malformations (VMs) are the most common vascular developmental anomalies (birth defects) . These defects are caused by developmental arrest of the venous system during various stages of embryogenesis. VMs remain a difficult diagnostic and therapeutic challenge due to the wide range of clinical presentations, unpredictable clinical course, erratic response to the treatment with high recurrence/persistence rates, high morbidity following non-specific conventional treatment, and confusing terminology. The Consensus Panel reviewed the recent scientific literature up to the year 2013 to update a previous IUP Consensus (2009) on the same subject. ISSVA Classification with special merits for the differentiation between the congenital vascular malformation (CVM) and vascular tumors was reinforced with an additional review on syndrome-based classification. A "modified" Hamburg classification was adopted to emphasize the importance of extratruncular vs. truncular sub-types of VMs. This incorporated the embryological origin, morphological differences, unique characteristics, prognosis and recurrence rates of VMs based on this embryological classification. The definition and classification of VMs were strengthened with the addition of angiographic data that determines the hemodynamic characteristics, the anatomical pattern of draining veins and hence the risk of complication following sclerotherapy. The hemolymphatic malformations, a combined condition incorporating LMs and other CVMs, were illustrated as a separate topic to differentiate from isolated VMs and to rectify the existing confusion with name-based eponyms such as Klippel-Trenaunay syndrome. Contemporary concepts on VMs were updated with new data including genetic findings linked to the etiology of CVMs and chronic cerebrospinal venous insufficiency. Besides, newly established information on coagulopathy including the role of D-Dimer was thoroughly reviewed to provide guidelines on investigations and anticoagulation therapy in the management of VMs. Congenital vascular bone syndrome resulting in angio-osteo-hyper/hypotrophy and (lateral) marginal vein was separately reviewed. Background data on arterio-venous malformations was included to differentiate this anomaly from syndrome-based VMs. For the treatment, a new section on laser therapy and also a practical guideline for follow up assessment were added to strengthen the management principle of the multidisciplinary approach. All other therapeutic modalities were thoroughly updated to accommodate a changing concept through the years.

Veja mais

ISVI-IUA consensus document - diagnostic guidelines on vascular anomalies: vascular malformations and hemangiomas.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The diagnostic approach to vascular anomalies should include the distinction between vascular tumors (i.e. hemangiomas) and congential vascular malformations (CVMs). This step is based more on history and clinical examination rather than on instrumental evaluation. In children Duplex ultrasound and histology can be helpful to separate hypervasularized tumors from CVMs. Appropriate record of objective measures as size or flow volume is required in order to evaluate the progress of the pathology and/or to assess the results of adopted therapeutic interventions. The anatomic, pathological and hemodynamic characteristics, the secondary effects on the surrounding tissues and the systemic manifestations should be defined. Basic diagnostic tools are Duplex sonography followed by MRI or CT scanning. The definition of the vascular anomaly should be according to the Hamburg classification and should separate vascular tumors from vacular malformations followed by separation of high flow from low flow CVMs. Diagnostic investigations are best undertaken at centers where subsequent therapeutic interventions will be performed.

Veja mais

Pathology: Classification and Immunoprofile

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The classification of neuroendocrine neoplasms (NENs) has been evolving steadily over the last decades. Important prognostic factors of NENs are their proliferative activity and presence/absence of necrosis. These factors are reported in NENs of all body sites; however, the terminology as well as the exact rules of classification differ according to the location of the primary tumor. Only in gastroenteropancreatic (GEP) NENs a formal grading is performed. This grading is based on proliferation assessed by the mitotic count and/or Ki-67 proliferation index. In the lung, NEN grading is an intrinsic part of the tumor designation with typical carcinoids corresponding to neuroendocrine tumor (NET) G1 and atypical carcinoids to NET G2; however, the presence or absence of necrotic foci is as important as proliferation for the differentiation between typical and atypical carcinoids. Immunohistochemical markers can be used to demonstrate neuroendocrine differentiation. Synaptophysin and chromogranin A are, to date, the most reliable and most commonly used for this purpose. Beyond this, other markers can be helpful, for example in the situation of a NET metastasis of unknown primary, where a hormonal profile or a panel of transcription factors can give hints to the primary site. Many immunohistochemical markers have been shown to correlate with prognosis but are not used in clinical practice, for example cytokeratin 19 and KIT expression in pancreatic NETs. There is no predictive biomarker in use, with the exception of somatostatin receptor (SSTR) 2 expression for predicting the amenability of a tumor to in vivo SSTR targeting for imaging or therapy.

Veja mais

Unsupervised system to classify SO2 pollutant concentrations in Salamanca, Mexico

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Salamanca is cataloged as one of the most polluted cities in Mexico. In order to observe the behavior and clarify the influence of wind parameters on the Sulphur Dioxide (SO2) concentrations a Self-Organizing Maps (SOM) Neural Network have been implemented at three monitoring locations for the period from January 1 to December 31, 2006. The maximum and minimum daily values of SO2 concentrations measured during the year of 2006 were correlated with the wind parameters of the same period. The main advantages of the SOM Neural Network is that it allows to integrate data from different sensors and provide readily interpretation results. Especially, it is powerful mapping and classification tool, which others information in an easier way and facilitates the task of establishing an order of priority between the distinguished groups of concentrations depending on their need for further research or remediation actions in subsequent management steps. For each monitoring location, SOM classifications were evaluated with respect to pollution levels established by Health Authorities. The classification system can help to establish a better air quality monitoring methodology that is essential for assessing the effectiveness of imposed pollution controls, strategies, and facilitate the pollutants reduction.

Veja mais

Multi-dimensional classification using Bayesian networks for stationary and evolving streaming data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Veja mais

Methodes et moyens pour etablir une nouvelle classification des impots en se basant sur les principes mis en lumiere lors de l'harmonisation des systemes fiscaux des Etats membres de la CEE = Methods and means for establishing a new classification of taxes based on the principles brought to light during the harmonization of tax systems of the Member States of the EEC. Studies: Competition - Approximation of Legislation 13, 1970

Relevância:

30.00% 30.00%

Publicador:

Veja mais

981 resultados para Unsupervised document classification

Filtro por publicador