830 resultados para Computing Classification Systems
Resumo:
The Ecological Society of America and NOAA's Offices of Habitat Conservation and Protected Resources sponsored a workshop to develop a national marine and estuarine ecosystem classification system. Among the 22 people involved were scientists who had developed various regional classification systems and managers from NOAA and other federal agencies who might ultimately use this system for conservation and management. The objectives were to: (1) review existing global and regional classification systems; (2) develop the framework of a national classification system; and (3) propose a plan to expand the framework into a comprehensive classification system. Although there has been progress in the development of marine classifications in recent years, these have been either regionally focused (e.g., Pacific islands) or restricted to specific habitats (e.g., wetlands; deep seafloor). Participants in the workshop looked for commonalties across existing classification systems and tried to link these using broad scale factors important to ecosystem structure and function.
Resumo:
Nearest neighbor retrieval is the task of identifying, given a database of objects and a query object, the objects in the database that are the most similar to the query. Retrieving nearest neighbors is a necessary component of many practical applications, in fields as diverse as computer vision, pattern recognition, multimedia databases, bioinformatics, and computer networks. At the same time, finding nearest neighbors accurately and efficiently can be challenging, especially when the database contains a large number of objects, and when the underlying distance measure is computationally expensive. This thesis proposes new methods for improving the efficiency and accuracy of nearest neighbor retrieval and classification in spaces with computationally expensive distance measures. The proposed methods are domain-independent, and can be applied in arbitrary spaces, including non-Euclidean and non-metric spaces. In this thesis particular emphasis is given to computer vision applications related to object and shape recognition, where expensive non-Euclidean distance measures are often needed to achieve high accuracy. The first contribution of this thesis is the BoostMap algorithm for embedding arbitrary spaces into a vector space with a computationally efficient distance measure. Using this approach, an approximate set of nearest neighbors can be retrieved efficiently - often orders of magnitude faster than retrieval using the exact distance measure in the original space. The BoostMap algorithm has two key distinguishing features with respect to existing embedding methods. First, embedding construction explicitly maximizes the amount of nearest neighbor information preserved by the embedding. Second, embedding construction is treated as a machine learning problem, in contrast to existing methods that are based on geometric considerations. The second contribution is a method for constructing query-sensitive distance measures for the purposes of nearest neighbor retrieval and classification. In high-dimensional spaces, query-sensitive distance measures allow for automatic selection of the dimensions that are the most informative for each specific query object. It is shown theoretically and experimentally that query-sensitivity increases the modeling power of embeddings, allowing embeddings to capture a larger amount of the nearest neighbor structure of the original space. The third contribution is a method for speeding up nearest neighbor classification by combining multiple embedding-based nearest neighbor classifiers in a cascade. In a cascade, computationally efficient classifiers are used to quickly classify easy cases, and classifiers that are more computationally expensive and also more accurate are only applied to objects that are harder to classify. An interesting property of the proposed cascade method is that, under certain conditions, classification time actually decreases as the size of the database increases, a behavior that is in stark contrast to the behavior of typical nearest neighbor classification systems. The proposed methods are evaluated experimentally in several different applications: hand shape recognition, off-line character recognition, online character recognition, and efficient retrieval of time series. In all datasets, the proposed methods lead to significant improvements in accuracy and efficiency compared to existing state-of-the-art methods. In some datasets, the general-purpose methods introduced in this thesis even outperform domain-specific methods that have been custom-designed for such datasets.
Resumo:
Predictive performance evaluation is a fundamental issue in design, development, and deployment of classification systems. As predictive performance evaluation is a multidimensional problem, single scalar summaries such as error rate, although quite convenient due to its simplicity, can seldom evaluate all the aspects that a complete and reliable evaluation must consider. Due to this, various graphical performance evaluation methods are increasingly drawing the attention of machine learning, data mining, and pattern recognition communities. The main advantage of these types of methods resides in their ability to depict the trade-offs between evaluation aspects in a multidimensional space rather than reducing these aspects to an arbitrarily chosen (and often biased) single scalar measure. Furthermore, to appropriately select a suitable graphical method for a given task, it is crucial to identify its strengths and weaknesses. This paper surveys various graphical methods often used for predictive performance evaluation. By presenting these methods in the same framework, we hope this paper may shed some light on deciding which methods are more suitable to use in different situations.
Resumo:
Committees of classifiers may be used to improve the accuracy of classification systems, in other words, different classifiers used to solve the same problem can be combined for creating a system of greater accuracy, called committees of classifiers. To that this to succeed is necessary that the classifiers make mistakes on different objects of the problem so that the errors of a classifier are ignored by the others correct classifiers when applying the method of combination of the committee. The characteristic of classifiers of err on different objects is called diversity. However, most measures of diversity could not describe this importance. Recently, were proposed two measures of the diversity (good and bad diversity) with the aim of helping to generate more accurate committees. This paper performs an experimental analysis of these measures applied directly on the building of the committees of classifiers. The method of construction adopted is modeled as a search problem by the set of characteristics of the databases of the problem and the best set of committee members in order to find the committee of classifiers to produce the most accurate classification. This problem is solved by metaheuristic optimization techniques, in their mono and multi-objective versions. Analyzes are performed to verify if use or add the measures of good diversity and bad diversity in the optimization objectives creates more accurate committees. Thus, the contribution of this study is to determine whether the measures of good diversity and bad diversity can be used in mono-objective and multi-objective optimization techniques as optimization objectives for building committees of classifiers more accurate than those built by the same process, but using only the accuracy classification as objective of optimization
Resumo:
This paper proposes a fuzzy classification system for the risk of infestation by weeds in agricultural zones considering the variability of weeds. The inputs of the system are features of the infestation extracted from estimated maps by kriging for the weed seed production and weed coverage, and from the competitiveness, inferred from narrow and broad-leaved weeds. Furthermore, a Bayesian network classifier is used to extract rules from data which are compared to the fuzzy rule set obtained on the base of specialist knowledge. Results for the risk inference in a maize crop field are presented and evaluated by the estimated yield loss. © 2009 IEEE.
Resumo:
Non-Hodgkin lymphomas are of many distinct types, and different classification systems make it difficult to diagnose them correctly. Many of these systems classify lymphomas only based on what they look like under a microscope. In 2008 the World Health Organisation (WHO) introduced the most recent system, which also considers the chromosome features of the lymphoma cells and the presence of certain proteins on their surface. The WHO system is the one that we apply in this work. Herewith we present an automatic method to classify histological images of three types of non-Hodgkin lymphoma. Our method is based on the Stationary Wavelet Transform (SWT), and it consists of three steps: 1) extracting sub-bands from the histological image through SWT, 2) applying Analysis of Variance (ANOVA) to clean noise and select the most relevant information, 3) classifying it by the Support Vector Machine (SVM) algorithm. The kernel types Linear, RBF and Polynomial were evaluated with our method applied to 210 images of lymphoma from the National Institute on Aging. We concluded that the following combination led to the most relevant results: detail sub-band, ANOVA and SVM with Linear and RBF kernels.
Resumo:
The development of cloud computing services is speeding up the rate in which the organizations outsource their computational services or sell their idle computational resources. Even though migrating to the cloud remains a tempting trend from a financial perspective, there are several other aspects that must be taken into account by companies before they decide to do so. One of the most important aspect refers to security: while some cloud computing security issues are inherited from the solutions adopted to create such services, many new security questions that are particular to these solutions also arise, including those related to how the services are organized and which kind of service/data can be placed in the cloud. Aiming to give a better understanding of this complex scenario, in this article we identify and classify the main security concerns and solutions in cloud computing, and propose a taxonomy of security in cloud computing, giving an overview of the current status of security in this emerging technology.
Resumo:
This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included.
Resumo:
A classification of injuries is necessary in order to develop a common language for treatment indications and outcomes. Several classification systems have been proposed, the most frequently used is the Denis classification. The problem of this classification system is that it is based on an assumption, which is anatomically unidentifiable: the so-called middle column. For this reason, few years ago, a group of spine surgeons has developed a new classification system, which is based on the severity of the injury. The severity is defined by the pathomorphological findings, the prognosis in terms of healing and potential of neurological damage. This classification is based on three major groups: A = isolated anterior column injuries by axial compression, B = disruption of the posterior ligament complex by distraction posteriorly, and group C = corresponding to group B but with rotation. There is an increasing severity from A to C, and within each group, the severity usually increases within the subgroups from .1, .2, .3. All these pathomorphologies are supported by a mechanism of injury, which is responsible for the extent of the injury. The type of injury with its groups and subgroups is able to suggest the treatment modality.
Resumo:
OBJECTIVE: The aim of this study was to establish an MRI classification system for intervertebral disks using axial T2 mapping, with a special focus on evaluating early degenerative intervertebral disks. MATERIALS AND METHODS: Twenty-nine healthy volunteers (19 men, 10 women; age range, 20-44 years; mean age, 31.8 years) were studied, and axial T2 mapping was performed for the L3-L4, L4-L5, and L5-S1 intervertebral disks. Grading was performed using three classification systems for degenerative disks: our system using axial T2 mapping and two other conventional classification systems that focused on the signal intensity of the nucleus pulposus or the structural morphology in sagittal T2-weighted MR images. We analyzed the relationship between T2, which is known to correlate with change in composition of intervertebral disks, and degenerative grade determined using the three classification systems. RESULTS: With axial T2 mapping, differences in T2 between grades I and II were smaller and those between grades II and III, and between grades III and IV, were larger than those with the other grading systems. The ratio of intervertebral disks classified as grade I was higher with the conventional classification systems than that with axial T2 mapping. In contrast, the ratio of intervertebral disks classified as grade II or III was higher with axial T2 mapping than that with the conventional classification systems. CONCLUSION: Axial T2 mapping provides a more T2-based classification. The new system may be able to detect early degenerative changes before the conventional classification systems can.
Resumo:
The comprehensive Hearing Preservation classification system presented in this paper is suitable for use for all cochlear implant users with measurable pre-operative residual hearing. If adopted as a universal reporting standard, as it was designed to be, it should prove highly beneficial by enabling future studies to quickly and easily compare the results of previous studies and meta-analyze their data. Objectives: To develop a comprehensive Hearing Preservation classification system suitable for use for all cochlear implant users with measurable pre-operative residual hearing. Methods: The HEARRING group discussed and reviewed a number of different propositions of a HP classification systems and reviewed critical appraisals to develop a qualitative system in accordance with the prerequisites. Results: The Hearing Preservation Classification System proposed herein fulfills the following necessary criteria: 1) classification is independent from users' initial hearing, 2) it is appropriate for all cochlear implant users with measurable pre-operative residual hearing, 3) it covers the whole range of pure tone average from 0 to 120 dB; 4) it is easy to use and easy to understand.
Resumo:
PURPOSE Validity of the seventh edition of the American Joint Committee on Cancer/International Union Against Cancer (AJCC/UICC) staging systems for gastric cancer has been evaluated in several studies, mostly in Asian patient populations. Only few data are available on the prognostic implications of the new classification system on a Western population. Therefore, we investigated its prognostic ability based on a German patient cohort. PATIENTS AND METHODS Data from a single-center cohort of 1,767 consecutive patients surgically treated for gastric cancer were classified according to the seventh edition and were compared using the previous TNM/UICC classification. Kaplan-Meier analyses were performed for all TNM stages and UICC stages in a comparative manner. Additional survival receiver operating characteristic analyses and bootstrap-based goodness-of-fit comparisons via Bayesian information criterion (BIC) were performed to assess and compare prognostic performance of the competing classification systems. RESULTS We identified the UICC pT/pN stages according to the seventh edition of the AJCC/UICC guidelines as well as resection status, age, Lauren histotype, lymph-node ratio, and tumor grade as independent prognostic factors in gastric cancer, which is consistent with data from previous Asian studies. Overall survival rates according to the new edition were significantly different for each individual's pT, pN, and UICC stage. However, BIC analysis revealed that, owing to higher complexity, the new staging system might not significantly alter predictability for overall survival compared with the old system within the analyzed cohort from a statistical point of view. CONCLUSION The seventh edition of the AJCC/UICC classification was found to be valid with distinctive prognosis for each stage. However, the AJCC/UICC classification has become more complex without improving predictability for overall survival in a Western population. Therefore, simplification with better predictability of overall survival of patients with gastric cancer should be considered when revising the seventh edition.
Resumo:
Abstract Web 2.0 applications enabled users to classify information resources using their own vocabularies. The bottom-up nature of these user-generated classification systems have turned them into interesting knowledge sources, since they provide a rich terminology generated by potentially large user communities. Previous research has shown that it is possible to elicit some emergent semantics from the aggregation of individual classifications in these systems. However the generation of ontologies from them is still an open research problem. In this thesis we address the problem of how to tap into user-generated classification systems for building domain ontologies. Our objective is to design a method to develop domain ontologies from user-generated classifications systems. To do so, we rely on ontologies in the Web of Data to formalize the semantics of the knowledge collected from the classification system. Current ontology development methodologies have recognized the importance of reusing knowledge from existing resources. Thus, our work is framed within the NeOn methodology scenario for building ontologies by reusing and reengineering non-ontological resources. The main contributions of this work are: An integrated method to develop ontologies from user-generated classification systems. With this method we extract a domain terminology from the classification system and then we formalize the semantics of this terminology by reusing ontologies in the Web of Data. Identification and adaptation of existing techniques for implementing the activities in the method so that they can fulfill the requirements of each activity. A novel study about emerging semantics in user-generated lists. Resumen La web 2.0 permitió a los usuarios clasificar recursos de información usando su propio vocabulario. Estos sistemas de clasificación generados por usuarios son recursos interesantes para la extracción de conocimiento debido principalmente a que proveen una extensa terminología generada por grandes comunidades de usuarios. Se ha demostrado en investigaciones previas que es posible obtener una semántica emergente de estos sistemas. Sin embargo la generación de ontologías a partir de ellos es todavía un problema de investigación abierto. Esta tesis trata el problema de cómo aprovechar los sistemas de clasificación generados por usuarios en la construcción de ontologías de dominio. Así el objetivo de la tesis es diseñar un método para desarrollar ontologías de dominio a partir de sistemas de clasificación generados por usuarios. El método propuesto reutiliza conceptualizaciones existentes en ontologías publicadas en la Web de Datos para formalizar la semántica del conocimiento que se extrae del sistema de clasificación. Por tanto, este trabajo está enmarcado dentro del escenario para desarrollar ontologías mediante la reutilización y reingeniería de recursos no ontológicos que se ha definido en la Metodología NeOn. Las principales contribuciones de este trabajo son: Un método integrado para desarrollar una ontología de dominio a partir de sistemas de clasificación generados por usuarios. En este método se extrae una terminología de dominio del sistema de clasificación y posteriormente se formaliza su semántica reutilizando ontologías en la Web de Datos. La identificación y adaptación de un conjunto de técnicas para implementar las actividades propuestas en el método de tal manera que puedan cumplir automáticamente los requerimientos de cada actividad. Un novedoso estudio acerca de la semántica emergente en las listas generadas por usuarios en la Web.
Resumo:
A membrane system is a massive parallel system, which is inspired by the living cells when processing information. As a part of unconventional computing, membrane systems are proven to be effective in solving complex problems. A new factor is introduced. This factor can decide whether a technique is worthwhile being used or not. The use of this factor provides the best chances for selecting the strategy for the rules application phase. Referring to the “best” is in reference to the one that reduces execution time within the membrane system. A pre-analysis of the membrane system determines the P-factor, which in return advises the optimal strategy to use. In particular, this paper compares the use of two strategies based on the P-factor and provides results upon the application of them. The paper concludes that the P-factor is an effective indicator for choosing the right strategy to implement the rules application phase in membrane systems.
Resumo:
La cuestión principal abordada en esta tesis doctoral es la mejora de los sistemas biométricos de reconocimiento de personas a partir de la voz, proponiendo el uso de una nueva parametrización, que hemos denominado parametrización biométrica extendida dependiente de género (GDEBP en sus siglas en inglés). No se propone una ruptura completa respecto a los parámetros clásicos sino una nueva forma de utilizarlos y complementarlos. En concreto, proponemos el uso de parámetros diferentes dependiendo del género del locutor, ya que como es bien sabido, la voz masculina y femenina presentan características diferentes que deberán modelarse, por tanto, de diferente manera. Además complementamos los parámetros clásicos utilizados (MFFC extraídos de la señal de voz), con un nuevo conjunto de parámetros extraídos a partir de la deconstrucción de la señal de voz en sus componentes de fuente glótica (más relacionada con el proceso y órganos de fonación y por tanto con características físicas del locutor) y de tracto vocal (más relacionada con la articulación acústica y por tanto con el mensaje emitido). Para verificar la validez de esta propuesta se plantean diversos escenarios, utilizando diferentes bases de datos, para validar que la GDEBP permite generar una descripción más precisa de los locutores que los parámetros MFCC clásicos independientes del género. En concreto se plantean diferentes escenarios de identificación sobre texto restringido y texto independiente utilizando las bases de datos de HESPERIA y ALBAYZIN. El trabajo también se completa con la participación en dos competiciones internacionales de reconocimiento de locutor, NIST SRE (2010 y 2012) y MOBIO 2013. En el primer caso debido a la naturaleza de las bases de datos utilizadas se obtuvieron resultados cercanos al estado del arte, mientras que en el segundo de los casos el sistema presentado obtuvo la mejor tasa de reconocimiento para locutores femeninos. A pesar de que el objetivo principal de esta tesis no es el estudio de sistemas de clasificación, sí ha sido necesario analizar el rendimiento de diferentes sistemas de clasificación, para ver el rendimiento de la parametrización propuesta. En concreto, se ha abordado el uso de sistemas de reconocimiento basados en el paradigma GMM-UBM, supervectores e i-vectors. Los resultados que se presentan confirman que la utilización de características que permitan describir los locutores de manera más precisa es en cierto modo más importante que la elección del sistema de clasificación utilizado por el sistema. En este sentido la parametrización propuesta supone un paso adelante en la mejora de los sistemas de reconocimiento biométrico de personas por la voz, ya que incluso con sistemas de clasificación relativamente simples se consiguen tasas de reconocimiento realmente competitivas. ABSTRACT The main question addressed in this thesis is the improvement of automatic speaker recognition systems, by the introduction of a new front-end module that we have called Gender Dependent Extended Biometric Parameterisation (GDEBP). This front-end do not constitute a complete break with respect to classical parameterisation techniques used in speaker recognition but a new way to obtain these parameters while introducing some complementary ones. Specifically, we propose a gender-dependent parameterisation, since as it is well known male and female voices have different characteristic, and therefore the use of different parameters to model these distinguishing characteristics should provide a better characterisation of speakers. Additionally, we propose the introduction of a new set of biometric parameters extracted from the components which result from the deconstruction of the voice into its glottal source estimate (close related to the phonation process and the involved organs, and therefore the physical characteristics of the speaker) and vocal tract estimate (close related to acoustic articulation and therefore to the spoken message). These biometric parameters constitute a complement to the classical MFCC extracted from the power spectral density of speech as a whole. In order to check the validity of this proposal we establish different practical scenarios, using different databases, so we can conclude that a GDEBP generates a more accurate description of speakers than classical approaches based on gender-independent MFCC. Specifically, we propose scenarios based on text-constrain and text-independent test using HESPERIA and ALBAYZIN databases. This work is also completed with the participation in two international speaker recognition evaluations: NIST SRE (2010 and 2012) and MOBIO 2013, with diverse results. In the first case, due to the nature of the NIST databases, we obtain results closed to state-of-the-art although confirming our hypothesis, whereas in the MOBIO SRE we obtain the best simple system performance for female speakers. Although the study of classification systems is beyond the scope of this thesis, we found it necessary to analise the performance of different classification systems, in order to verify the effect of them on the propose parameterisation. In particular, we have addressed the use of speaker recognition systems based on the GMM-UBM paradigm, supervectors and i-vectors. The presented results confirm that the selection of a set of parameters that allows for a more accurate description of the speakers is as important as the selection of the classification method used by the biometric system. In this sense, the proposed parameterisation constitutes a step forward in improving speaker recognition systems, since even when using relatively simple classification systems, really competitive recognition rates are achieved.