956 resultados para Real Classification
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The clustering problem consists in finding patterns in a data set in order to divide it into clusters with high within-cluster similarity. This paper presents the study of a problem, here called MMD problem, which aims at finding a clustering with a predefined number of clusters that minimizes the largest within-cluster distance (diameter) among all clusters. There are two main objectives in this paper: to propose heuristics for the MMD and to evaluate the suitability of the best proposed heuristic results according to the real classification of some data sets. Regarding the first objective, the results obtained in the experiments indicate a good performance of the best proposed heuristic that outperformed the Complete Linkage algorithm (the most used method from the literature for this problem). Nevertheless, regarding the suitability of the results according to the real classification of the data sets, the proposed heuristic achieved better quality results than C-Means algorithm, but worse than Complete Linkage.
Resumo:
After decades of confusion in lymphoma classification clearness was achieved with the publication of the REAL classification 1994 and of the WHO classification 2001. The revised 4th edition 2008 features some additional new categories. The WHO classification comprises B- and T-lymphoblastic neoplasms, mature B-cell lymphomas, mature T-cell and NK-cell lymphomas and Hodgkin lymphomas. A modern diagnostic work-up of lymphomas is based on morphology, immunohistochemistry and increasingly on molecular studies. Last but not least the evaluation of all these findings by an expert haematopathologist, who collaborates closely with the treating clinicians, is essential. The aim is to give an overview of the most frequent mature B-cell lymphomas and the most important classical Hodgkin lymphomas with focus on morphology and immunohistochemistry.
Resumo:
Poder clasificar de manera precisa la aplicación o programa del que provienen los flujos que conforman el tráfico de uso de Internet dentro de una red permite tanto a empresas como a organismos una útil herramienta de gestión de los recursos de sus redes, así como la posibilidad de establecer políticas de prohibición o priorización de tráfico específico. La proliferación de nuevas aplicaciones y de nuevas técnicas han dificultado el uso de valores conocidos (well-known) en puertos de aplicaciones proporcionados por la IANA (Internet Assigned Numbers Authority) para la detección de dichas aplicaciones. Las redes P2P (Peer to Peer), el uso de puertos no conocidos o aleatorios, y el enmascaramiento de tráfico de muchas aplicaciones en tráfico HTTP y HTTPS con el fin de atravesar firewalls y NATs (Network Address Translation), entre otros, crea la necesidad de nuevos métodos de detección de tráfico. El objetivo de este estudio es desarrollar una serie de prácticas que permitan realizar dicha tarea a través de técnicas que están más allá de la observación de puertos y otros valores conocidos. Existen una serie de metodologías como Deep Packet Inspection (DPI) que se basa en la búsqueda de firmas, signatures, en base a patrones creados por el contenido de los paquetes, incluido el payload, que caracterizan cada aplicación. Otras basadas en el aprendizaje automático de parámetros de los flujos, Machine Learning, que permite determinar mediante análisis estadísticos a qué aplicación pueden pertenecer dichos flujos y, por último, técnicas de carácter más heurístico basadas en la intuición o el conocimiento propio sobre tráfico de red. En concreto, se propone el uso de alguna de las técnicas anteriormente comentadas en conjunto con técnicas de minería de datos como son el Análisis de Componentes Principales (PCA por sus siglas en inglés) y Clustering de estadísticos extraídos de los flujos procedentes de ficheros de tráfico de red. Esto implicará la configuración de diversos parámetros que precisarán de un proceso iterativo de prueba y error que permita dar con una clasificación del tráfico fiable. El resultado ideal sería aquel en el que se pudiera identificar cada aplicación presente en el tráfico en un clúster distinto, o en clusters que agrupen grupos de aplicaciones de similar naturaleza. Para ello, se crearán capturas de tráfico dentro de un entorno controlado e identificando cada tráfico con su aplicación correspondiente, a continuación se extraerán los flujos de dichas capturas. Tras esto, parámetros determinados de los paquetes pertenecientes a dichos flujos serán obtenidos, como por ejemplo la fecha y hora de llagada o la longitud en octetos del paquete IP. Estos parámetros serán cargados en una base de datos MySQL y serán usados para obtener estadísticos que ayuden, en un siguiente paso, a realizar una clasificación de los flujos mediante minería de datos. Concretamente, se usarán las técnicas de PCA y clustering haciendo uso del software RapidMiner. Por último, los resultados obtenidos serán plasmados en una matriz de confusión que nos permitirá que sean valorados correctamente. ABSTRACT. Being able to classify the applications that generate the traffic flows in an Internet network allows companies and organisms to implement efficient resource management policies such as prohibition of specific applications or prioritization of certain application traffic, looking for an optimization of the available bandwidth. The proliferation of new applications and new technics in the last years has made it more difficult to use well-known values assigned by the IANA (Internet Assigned Numbers Authority), like UDP and TCP ports, to identify the traffic. Also, P2P networks and data encapsulation over HTTP and HTTPS traffic has increased the necessity to improve these traffic analysis technics. The aim of this project is to develop a number of techniques that make us able to classify the traffic with more than the simple observation of the well-known ports. There are some proposals that have been created to cover this necessity; Deep Packet Inspection (DPI) tries to find signatures in the packets reading the information contained in them, the payload, looking for patterns that can be used to characterize the applications to which that traffic belongs; Machine Learning procedures work with statistical analysis of the flows, trying to generate an automatic process that learns from those statistical parameters and calculate the likelihood of a flow pertaining to a certain application; Heuristic Techniques, finally, are based in the intuition or the knowledge of the researcher himself about the traffic being analyzed that can help him to characterize the traffic. Specifically, the use of some of the techniques previously mentioned in combination with data mining technics such as Principal Component Analysis (PCA) and Clustering (grouping) of the flows extracted from network traffic captures are proposed. An iterative process based in success and failure will be needed to configure these data mining techniques looking for a reliable traffic classification. The perfect result would be the one in which the traffic flows of each application is grouped correctly in each cluster or in clusters that contain group of applications of similar nature. To do this, network traffic captures will be created in a controlled environment in which every capture is classified and known to pertain to a specific application. Then, for each capture, all the flows will be extracted. These flows will be used to extract from them information such as date and arrival time or the IP length of the packets inside them. This information will be then loaded to a MySQL database where all the packets defining a flow will be classified and also, each flow will be assigned to its specific application. All the information obtained from the packets will be used to generate statistical parameters in order to describe each flow in the best possible way. After that, data mining techniques previously mentioned (PCA and Clustering) will be used on these parameters making use of the software RapidMiner. Finally, the results obtained from the data mining will be compared with the real classification of the flows that can be obtained from the database. A Confusion Matrix will be used for the comparison, letting us measure the veracity of the developed classification process.
Resumo:
Purpose: To evaluate the clinical features, treatment, and outcomes of a cohort of patients with ocular adnexal lymphoproliferative disease classified according to the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms and to perform a robust statistical analysis of these data. Methods: Sixty-nine cases of ocular adnexal lymphoproliferative disease, seen in a tertiary referral center from 1992 to 2003, were included in the study. Lesions were classified by using the World Health Organization modification of the Revised European-American Classification of Lymphoid neoplasms classification. Outcome variables included disease-specific Survival, relapse-free survival, local control, and distant control. Results: Stage IV disease at presentation, aggressive lymphoma histology, the presence of prior or concurrent systemic lymphoma at presentation, and bilateral adnexal disease were significant predictors for reduced disease-specific survival, local control, and distant control. Multivariate analysis found that aggressive histology and bilateral adnexal disease had significantly reduced disease-specific Survival. Conclusions: The typical presentation of adnexal lymphoproliferative disease is with a painless mass, swelling, or proptosis; however, pain and inflammation occurred in 20% and 30% of patients, respectively. Stage at presentation, tumor histology, primary or secondary status, and whether the process was unilateral or bilateral were significant variables for disease outcome. In this study, distant spread of lymphoma was lower in patients who received greater than 20 Gy of orbital radiotherapy.
Resumo:
Behavioral biometrics is one of the areas with growing interest within the biosignal research community. A recent trend in the field is ECG-based biometrics, where electrocardiographic (ECG) signals are used as input to the biometric system. Previous work has shown this to be a promising trait, with the potential to serve as a good complement to other existing, and already more established modalities, due to its intrinsic characteristics. In this paper, we propose a system for ECG biometrics centered on signals acquired at the subject's hand. Our work is based on a previously developed custom, non-intrusive sensing apparatus for data acquisition at the hands, and involved the pre-processing of the ECG signals, and evaluation of two classification approaches targeted at real-time or near real-time applications. Preliminary results show that this system leads to competitive results both for authentication and identification, and further validate the potential of ECG signals as a complementary modality in the toolbox of the biometric system designer.
Resumo:
Genetic Programming (GP) is a widely used methodology for solving various computational problems. GP's problem solving ability is usually hindered by its long execution times. In this thesis, GP is applied toward real-time computer vision. In particular, object classification and tracking using a parallel GP system is discussed. First, a study of suitable GP languages for object classification is presented. Two main GP approaches for visual pattern classification, namely the block-classifiers and the pixel-classifiers, were studied. Results showed that the pixel-classifiers generally performed better. Using these results, a suitable language was selected for the real-time implementation. Synthetic video data was used in the experiments. The goal of the experiments was to evolve a unique classifier for each texture pattern that existed in the video. The experiments revealed that the system was capable of correctly tracking the textures in the video. The performance of the system was on-par with real-time requirements.
Resumo:
An important application of Big Data Analytics is the real-time analysis of streaming data. Streaming data imposes unique challenges to data mining algorithms, such as concept drifts, the need to analyse the data on the fly due to unbounded data streams and scalable algorithms due to potentially high throughput of data. Real-time classification algorithms that are adaptive to concept drifts and fast exist, however, most approaches are not naturally parallel and are thus limited in their scalability. This paper presents work on the Micro-Cluster Nearest Neighbour (MC-NN) classifier. MC-NN is based on an adaptive statistical data summary based on Micro-Clusters. MC-NN is very fast and adaptive to concept drift whilst maintaining the parallel properties of the base KNN classifier. Also MC-NN is competitive compared with existing data stream classifiers in terms of accuracy and speed.
Resumo:
Hospitals attached to the Spanish Ministry of Health are currently using the International Classification of Diseases 9 Clinical Modification (ICD9-CM) to classify health discharge records. Nowadays, this work is manually done by experts. This paper tackles the automatic classification of real Discharge Records in Spanish following the ICD9-CM standard. The challenge is that the Discharge Records are written in spontaneous language. We explore several machine learning techniques to deal with the classification problem. Random Forest resulted in the most competitive one, achieving an F-measure of 0.876.
Resumo:
Fast Classification (FC) networks were inspired by a biologically plausible mechanism for short term memory where learning occurs instantaneously. Both weights and the topology for an FC network are mapped directly from the training samples by using a prescriptive training scheme. Only two presentations of the training data are required to train an FC network. Compared with iterative learning algorithms such as Back-propagation (which may require many hundreds of presentations of the training data), the training of FC networks is extremely fast and learning convergence is always guaranteed. Thus FC networks may be suitable for applications where real-time classification is needed. In this paper, the FC networks are applied for the real-time extraction of gene expressions for Chlamydia microarray data. Both the classification performance and learning time of the FC networks are compared with the Multi-Layer Proceptron (MLP) networks and support-vector-machines (SVM) in the same classification task. The FC networks are shown to have extremely fast learning time and comparable classification accuracy.
Resumo:
Urban regeneration is more and more a “universal issue” and a crucial factor in the new trends of urban planning. It is no longer only an area of study and research; it became part of new urban and housing policies. Urban regeneration involves complex decisions as a consequence of the multiple dimensions of the problems that include special technical requirements, safety concerns, socio-economic, environmental, aesthetic, and political impacts, among others. This multi-dimensional nature of urban regeneration projects and their large capital investments justify the development and use of state-of-the-art decision support methodologies to assist decision makers. This research focuses on the development of a multi-attribute approach for the evaluation of building conservation status in urban regeneration projects, thus supporting decision makers in their analysis of the problem and in the definition of strategies and priorities of intervention. The methods presented can be embedded into a Geographical Information System for visualization of results. A real-world case study was used to test the methodology, whose results are also presented.
Resumo:
Esta tese tem por objectivo o desenho e avaliação de um sistema de contagem e classificação de veículos automóveis em tempo-real e sem fios. Pretende, também, ser uma alternativa aos actuais equipamentos, muito intrusivos nas vias rodoviárias. Esta tese inclui um estudo sobre as comunicações sem fios adequadas a uma rede de equipamentos sensores rodoviários, um estudo sobre a utilização do campo magnético como meio físico de detecção e contagem de veículos e um estudo sobre a autonomia energética dos equipamentos inseridos na via, com recurso, entre outros, à energia solar. O projecto realizado no âmbito desta tese incorpora, entre outros, a digitalização em tempo real da assinatura magnética deixada pela passagem de um veículo, no campo magnético da Terra, o respectivo envio para servidor via rádio e WAN, Wide Area Network, e o desenvolvimento de software tendo por base a pilha de protocolos ZigBee. Foram desenvolvidas aplicações para o equipamento sensor, para o coordenador, para o painel de controlo e para a biblioteca de Interface de um futuro servidor aplicacional. O software desenvolvido para o equipamento sensor incorpora ciclos de detecção e digitalização, com pausas de adormecimento de baixo consumo, e a activação das comunicações rádio durante a fase de envio, assegurando assim uma estratégia de poupança energética. Os resultados obtidos confirmam a viabilidade desta tecnologia para a detecção e contagem de veículos, assim como para a captura de assinatura usando magnetoresistências. Permitiram ainda verificar o alcance das comunicações sem fios com equipamento sensor embebido no asfalto e confirmar o modelo de cálculo da superfície do painel solar bem como o modelo de consumo energético do equipamento sensor.
Resumo:
Trabalho de Projeto para obtenção do grau de Mestre em Engenharia Civil na Área de Especialização em Edificações
Resumo:
In the present paper we assess the performance of information-theoretic inspired risks functionals in multilayer perceptrons with reference to the two most popular ones, Mean Square Error and Cross-Entropy. The information-theoretic inspired risks, recently proposed, are: HS and HR2 are, respectively, the Shannon and quadratic Rényi entropies of the error; ZED is a risk reflecting the error density at zero errors; EXP is a generalized exponential risk, able to mimic a wide variety of risk functionals, including the information-thoeretic ones. The experiments were carried out with multilayer perceptrons on 35 public real-world datasets. All experiments were performed according to the same protocol. The statistical tests applied to the experimental results showed that the ubiquitous mean square error was the less interesting risk functional to be used by multilayer perceptrons. Namely, mean square error never achieved a significantly better classification performance than competing risks. Cross-entropy and EXP were the risks found by several tests to be significantly better than their competitors. Counts of significantly better and worse risks have also shown the usefulness of HS and HR2 for some datasets.