820 resultados para Data classification
Resumo:
El daño cerebral adquirido (DCA) es un problema social y sanitario grave, de magnitud creciente y de una gran complejidad diagnóstica y terapéutica. Su elevada incidencia, junto con el aumento de la supervivencia de los pacientes, una vez superada la fase aguda, lo convierten también en un problema de alta prevalencia. En concreto, según la Organización Mundial de la Salud (OMS) el DCA estará entre las 10 causas más comunes de discapacidad en el año 2020. La neurorrehabilitación permite mejorar el déficit tanto cognitivo como funcional y aumentar la autonomía de las personas con DCA. Con la incorporación de nuevas soluciones tecnológicas al proceso de neurorrehabilitación se pretende alcanzar un nuevo paradigma donde se puedan diseñar tratamientos que sean intensivos, personalizados, monitorizados y basados en la evidencia. Ya que son estas cuatro características las que aseguran que los tratamientos son eficaces. A diferencia de la mayor parte de las disciplinas médicas, no existen asociaciones de síntomas y signos de la alteración cognitiva que faciliten la orientación terapéutica. Actualmente, los tratamientos de neurorrehabilitación se diseñan en base a los resultados obtenidos en una batería de evaluación neuropsicológica que evalúa el nivel de afectación de cada una de las funciones cognitivas (memoria, atención, funciones ejecutivas, etc.). La línea de investigación en la que se enmarca este trabajo de investigación pretende diseñar y desarrollar un perfil cognitivo basado no sólo en el resultado obtenido en esa batería de test, sino también en información teórica que engloba tanto estructuras anatómicas como relaciones funcionales e información anatómica obtenida de los estudios de imagen. De esta forma, el perfil cognitivo utilizado para diseñar los tratamientos integra información personalizada y basada en la evidencia. Las técnicas de neuroimagen representan una herramienta fundamental en la identificación de lesiones para la generación de estos perfiles cognitivos. La aproximación clásica utilizada en la identificación de lesiones consiste en delinear manualmente regiones anatómicas cerebrales. Esta aproximación presenta diversos problemas relacionados con inconsistencias de criterio entre distintos clínicos, reproducibilidad y tiempo. Por tanto, la automatización de este procedimiento es fundamental para asegurar una extracción objetiva de información. La delineación automática de regiones anatómicas se realiza mediante el registro tanto contra atlas como contra otros estudios de imagen de distintos sujetos. Sin embargo, los cambios patológicos asociados al DCA están siempre asociados a anormalidades de intensidad y/o cambios en la localización de las estructuras. Este hecho provoca que los algoritmos de registro tradicionales basados en intensidad no funcionen correctamente y requieran la intervención del clínico para seleccionar ciertos puntos (que en esta tesis hemos denominado puntos singulares). Además estos algoritmos tampoco permiten que se produzcan deformaciones grandes deslocalizadas. Hecho que también puede ocurrir ante la presencia de lesiones provocadas por un accidente cerebrovascular (ACV) o un traumatismo craneoencefálico (TCE). Esta tesis se centra en el diseño, desarrollo e implementación de una metodología para la detección automática de estructuras lesionadas que integra algoritmos cuyo objetivo principal es generar resultados que puedan ser reproducibles y objetivos. Esta metodología se divide en cuatro etapas: pre-procesado, identificación de puntos singulares, registro y detección de lesiones. Los trabajos y resultados alcanzados en esta tesis son los siguientes: Pre-procesado. En esta primera etapa el objetivo es homogeneizar todos los datos de entrada con el objetivo de poder extraer conclusiones válidas de los resultados obtenidos. Esta etapa, por tanto, tiene un gran impacto en los resultados finales. Se compone de tres operaciones: eliminación del cráneo, normalización en intensidad y normalización espacial. Identificación de puntos singulares. El objetivo de esta etapa es automatizar la identificación de puntos anatómicos (puntos singulares). Esta etapa equivale a la identificación manual de puntos anatómicos por parte del clínico, permitiendo: identificar un mayor número de puntos lo que se traduce en mayor información; eliminar el factor asociado a la variabilidad inter-sujeto, por tanto, los resultados son reproducibles y objetivos; y elimina el tiempo invertido en el marcado manual de puntos. Este trabajo de investigación propone un algoritmo de identificación de puntos singulares (descriptor) basado en una solución multi-detector y que contiene información multi-paramétrica: espacial y asociada a la intensidad. Este algoritmo ha sido contrastado con otros algoritmos similares encontrados en el estado del arte. Registro. En esta etapa se pretenden poner en concordancia espacial dos estudios de imagen de sujetos/pacientes distintos. El algoritmo propuesto en este trabajo de investigación está basado en descriptores y su principal objetivo es el cálculo de un campo vectorial que permita introducir deformaciones deslocalizadas en la imagen (en distintas regiones de la imagen) y tan grandes como indique el vector de deformación asociado. El algoritmo propuesto ha sido comparado con otros algoritmos de registro utilizados en aplicaciones de neuroimagen que se utilizan con estudios de sujetos control. Los resultados obtenidos son prometedores y representan un nuevo contexto para la identificación automática de estructuras. Identificación de lesiones. En esta última etapa se identifican aquellas estructuras cuyas características asociadas a la localización espacial y al área o volumen han sido modificadas con respecto a una situación de normalidad. Para ello se realiza un estudio estadístico del atlas que se vaya a utilizar y se establecen los parámetros estadísticos de normalidad asociados a la localización y al área. En función de las estructuras delineadas en el atlas, se podrán identificar más o menos estructuras anatómicas, siendo nuestra metodología independiente del atlas seleccionado. En general, esta tesis doctoral corrobora las hipótesis de investigación postuladas relativas a la identificación automática de lesiones utilizando estudios de imagen médica estructural, concretamente estudios de resonancia magnética. Basándose en estos cimientos, se han abrir nuevos campos de investigación que contribuyan a la mejora en la detección de lesiones. ABSTRACT Brain injury constitutes a serious social and health problem of increasing magnitude and of great diagnostic and therapeutic complexity. Its high incidence and survival rate, after the initial critical phases, makes it a prevalent problem that needs to be addressed. In particular, according to the World Health Organization (WHO), brain injury will be among the 10 most common causes of disability by 2020. Neurorehabilitation improves both cognitive and functional deficits and increases the autonomy of brain injury patients. The incorporation of new technologies to the neurorehabilitation tries to reach a new paradigm focused on designing intensive, personalized, monitored and evidence-based treatments. Since these four characteristics ensure the effectivity of treatments. Contrary to most medical disciplines, it is not possible to link symptoms and cognitive disorder syndromes, to assist the therapist. Currently, neurorehabilitation treatments are planned considering the results obtained from a neuropsychological assessment battery, which evaluates the functional impairment of each cognitive function (memory, attention, executive functions, etc.). The research line, on which this PhD falls under, aims to design and develop a cognitive profile based not only on the results obtained in the assessment battery, but also on theoretical information that includes both anatomical structures and functional relationships and anatomical information obtained from medical imaging studies, such as magnetic resonance. Therefore, the cognitive profile used to design these treatments integrates information personalized and evidence-based. Neuroimaging techniques represent an essential tool to identify lesions and generate this type of cognitive dysfunctional profiles. Manual delineation of brain anatomical regions is the classical approach to identify brain anatomical regions. Manual approaches present several problems related to inconsistencies across different clinicians, time and repeatability. Automated delineation is done by registering brains to one another or to a template. However, when imaging studies contain lesions, there are several intensity abnormalities and location alterations that reduce the performance of most of the registration algorithms based on intensity parameters. Thus, specialists may have to manually interact with imaging studies to select landmarks (called singular points in this PhD) or identify regions of interest. These two solutions have the same inconvenient than manual approaches, mentioned before. Moreover, these registration algorithms do not allow large and distributed deformations. This type of deformations may also appear when a stroke or a traumatic brain injury (TBI) occur. This PhD is focused on the design, development and implementation of a new methodology to automatically identify lesions in anatomical structures. This methodology integrates algorithms whose main objective is to generate objective and reproducible results. It is divided into four stages: pre-processing, singular points identification, registration and lesion detection. Pre-processing stage. In this first stage, the aim is to standardize all input data in order to be able to draw valid conclusions from the results. Therefore, this stage has a direct impact on the final results. It consists of three steps: skull-stripping, spatial and intensity normalization. Singular points identification. This stage aims to automatize the identification of anatomical points (singular points). It involves the manual identification of anatomical points by the clinician. This automatic identification allows to identify a greater number of points which results in more information; to remove the factor associated to inter-subject variability and thus, the results are reproducible and objective; and to eliminate the time spent on manual marking. This PhD proposed an algorithm to automatically identify singular points (descriptor) based on a multi-detector approach. This algorithm contains multi-parametric (spatial and intensity) information. This algorithm has been compared with other similar algorithms found on the state of the art. Registration. The goal of this stage is to put in spatial correspondence two imaging studies of different subjects/patients. The algorithm proposed in this PhD is based on descriptors. Its main objective is to compute a vector field to introduce distributed deformations (changes in different imaging regions), as large as the deformation vector indicates. The proposed algorithm has been compared with other registration algorithms used on different neuroimaging applications which are used with control subjects. The obtained results are promising and they represent a new context for the automatic identification of anatomical structures. Lesion identification. This final stage aims to identify those anatomical structures whose characteristics associated to spatial location and area or volume has been modified with respect to a normal state. A statistical study of the atlas to be used is performed to establish which are the statistical parameters associated to the normal state. The anatomical structures that may be identified depend on the selected anatomical structures identified on the atlas. The proposed methodology is independent from the selected atlas. Overall, this PhD corroborates the investigated research hypotheses regarding the automatic identification of lesions based on structural medical imaging studies (resonance magnetic studies). Based on these foundations, new research fields to improve the automatic identification of lesions in brain injury can be proposed.
Resumo:
Objective: To evaluate the impact of the revised diagnostic criteria for diabetes mellitus adopted by the American Diabetes Association on prevalence of diabetes and on classification of patients. For epidemiological purposes the American criteria use a fasting plasma glucose concentration ⩾7.0 mmol/l in contrast with the current World Health Organisation criteria of 2 hour glucose concentration ⩾11.1 mmol/l.
Resumo:
In order to support the structural genomic initiatives, both by rapidly classifying newly determined structures and by suggesting suitable targets for structure determination, we have recently developed several new protocols for classifying structures in the CATH domain database (http://www.biochem.ucl.ac.uk/bsm/cath). These aim to increase the speed of classification of new structures using fast algorithms for structure comparison (GRATH) and to improve the sensitivity in recognising distant structural relatives by incorporating sequence information from relatives in the genomes (DomainFinder). In order to ensure the integrity of the database given the expected increase in data, the CATH Protein Family Database (CATH-PFDB), which currently includes 25 320 structural domains and a further 160 000 sequence relatives has now been installed in a relational ORACLE database. This was essential for developing more rigorous validation procedures and for allowing efficient querying of the database, particularly for genome analysis. The associated Dictionary of Homologous Superfamilies [Bray,J.E., Todd,A.E., Pearl,F.M.G., Thornton,J.M. and Orengo,C.A. (2000) Protein Eng., 13, 153–165], which provides multiple structural alignments and functional information to assist in assigning new relatives, has also been expanded recently and now includes information for 903 homologous superfamilies. In order to improve coverage of known structures, preliminary classification levels are now provided for new structures at interim stages in the classification protocol. Since a large proportion of new structures can be rapidly classified using profile-based sequence analysis [e.g. PSI-BLAST: Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402], this provides preliminary classification for easily recognisable homologues, which in the latest release of CATH (version 1.7) represented nearly three-quarters of the non-identical structures.
Resumo:
PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). The previous version of PDB-REPRDB provided 48 representative sets, whose similarity criteria were predetermined, on the WWW. The current version is designed so that the user may obtain a quick selection of representative chains from PDB. The selection of representative chains can be dynamically configured according to the user’s requirement. The WWW interface provides a large degree of freedom in setting parameters, such as cut-off scores of sequence and structural similarity. One can obtain a representative list and classification data of protein chains from the system. The current database includes 20 457 protein chains from PDB entries (August 6, 2000). The system for PDB-REPRDB is available at the Parallel Protein Information Analysis system (PAPIA) WWW server (http://www.rwcp.or.jp/papia/).
Resumo:
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP.
Resumo:
The Dali Domain Dictionary (http://www.ebi.ac.uk/dali/domain) is a numerical taxonomy of all known structures in the Protein Data Bank (PDB). The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities. Here, we report the extension of the classification to match the traditional four hierarchical levels corresponding to: (i) supersecondary structural motifs (attractors in fold space), (ii) the topology of globular domains (fold types), (iii) remote homologues (functional families) and (iv) homologues with sequence identity above 25% (sequence families). The computational definitions of attractors and functional families are new. In September 2000, the Dali classification contained 10 531 PDB entries comprising 17 101 chains, which were partitioned into five attractor regions, 1375 fold types, 2582 functional families and 3724 domain sequence families. Sequence families were further associated with 99 582 unique homologous sequences in the HSSP database, which increases the number of effectively known structures several-fold. The resulting database contains the description of protein domain architecture, the definition of structural neighbours around each known structure, the definition of structurally conserved cores and a comprehensive library of explicit multiple alignments of distantly related protein families.
Resumo:
The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac.uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31–67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
Resumo:
The Identification and Classification of Bacteria (ICB) database (http:/www.mbio.co.jp/icb) contains currently available information about the DNA gyrase subunit B (gyrB) gene in bacteria. The database is designed to provide the scientific community with a reference point for using gyrB as an evolutionary and taxonomic marker. Nucleic and amino acid sequence data are currently available for over 850 strains, along with alignments at several different taxonomic levels and an exhaustive review of primer selection and background information.
Resumo:
Of the approximately 380 families of angiosperms, representatives of only 10 are known to form symbiotic associations with nitrogen-fixing bacteria in root nodules. The morphologically based classification schemes proposed by taxonomists suggest that many of these 10 families of plants are only distantly related, engendering the hypothesis that the capacity to fix nitrogen evolved independently several, if not many, times. This has in turn influenced attitudes toward the likelihood of transferring genes responsible for symbiotic nitrogen fixation to crop species lacking this ability. Phylogenetic analysis of DNA sequences for the chloroplast gene rbcL indicates, however, that representatives of all 10 families with nitrogen-fixing symbioses occur together, with several families lacking this association, in a single clade. This study therefore indicates that only one lineage of closely related taxa achieved the underlying genetic architecture necessary for symbiotic nitrogen fixation in root nodules.
Resumo:
In order to protect critical military and commercial space assets, the United States Space Surveillance Network must have the ability to positively identify and characterize all space objects. Unfortunately, positive identification and characterization of space objects is a manual and labor intensive process today since even large telescopes cannot provide resolved images of most space objects. Since resolved images of geosynchronous satellites are not technically feasible with current technology, another method of distinguishing space objects was explored that exploits the polarization signature from unresolved images. The objective of this study was to collect and analyze visible-spectrum polarization data from unresolved images of geosynchronous satellites taken over various solar phase angles. Different collection geometries were used to evaluate the polarization contribution of solar arrays, thermal control materials, antennas, and the satellite bus as the solar phase angle changed. Since materials on space objects age due to the space environment, it was postulated that their polarization signature may change enough to allow discrimination of identical satellites launched at different times. The instrumentation used in this experiment was a United States Air Force Academy (USAFA) Department of Physics system that consists of a 20-inch Ritchey-Chrétien telescope and a dual focal plane optical train fed with a polarizing beam splitter. A rigorous calibration of the system was performed that included corrections for pixel bias, dark current, and response. Additionally, the two channel polarimeter was calibrated by experimentally determining the Mueller matrix for the system and relating image intensity at the two cameras to Stokes parameters S0 and S1. After the system calibration, polarization data was collected during three nights on eight geosynchronous satellites built by various manufacturers and launched several years apart. Three pairs of the eight satellites were identical buses to determine if identical buses could be correctly differentiated. When Stokes parameters were plotted against time and solar phase angle, the data indicates that there were distinguishing features in S0 (total intensity) and S1 (linear polarization) that may lead to positive identification or classification of each satellite.
Resumo:
Background: The harmonization of European health systems brings with it a need for tools to allow the standardized collection of information about medical care. A common coding system and standards for the description of services are needed to allow local data to be incorporated into evidence-informed policy, and to permit equity and mobility to be assessed. The aim of this project has been to design such a classification and a related tool for the coding of services for Long Term Care (DESDE-LTC), based on the European Service Mapping Schedule (ESMS). Methods: The development of DESDE-LTC followed an iterative process using nominal groups in 6 European countries. 54 researchers and stakeholders in health and social services contributed to this process. In order to classify services, we use the minimal organization unit or “Basic Stable Input of Care” (BSIC), coded by its principal function or “Main Type of Care” (MTC). The evaluation of the tool included an analysis of feasibility, consistency, ontology, inter-rater reliability, Boolean Factor Analysis, and a preliminary impact analysis (screening, scoping and appraisal). Results: DESDE-LTC includes an alpha-numerical coding system, a glossary and an assessment instrument for mapping and counting LTC. It shows high feasibility, consistency, inter-rater reliability and face, content and construct validity. DESDE-LTC is ontologically consistent. It is regarded by experts as useful and relevant for evidence-informed decision making. Conclusion: DESDE-LTC contributes to establishing a common terminology, taxonomy and coding of LTC services in a European context, and a standard procedure for data collection and international comparison.
Resumo:
3D sensors provides valuable information for mobile robotic tasks like scene classification or object recognition, but these sensors often produce noisy data that makes impossible applying classical keypoint detection and feature extraction techniques. Therefore, noise removal and downsampling have become essential steps in 3D data processing. In this work, we propose the use of a 3D filtering and down-sampling technique based on a Growing Neural Gas (GNG) network. GNG method is able to deal with outliers presents in the input data. These features allows to represent 3D spaces, obtaining an induced Delaunay Triangulation of the input space. Experiments show how the state-of-the-art keypoint detectors improve their performance using GNG output representation as input data. Descriptors extracted on improved keypoints perform better matching in robotics applications as 3D scene registration.
Resumo:
A new classification of microtidal sand and gravel beaches with very different morphologies is presented below. In 557 studied transects, 14 variables were used. Among the variables to be emphasized is the depth of the Posidonia oceanica. The classification was performed for 9 types of beaches: Type 1: Sand and gravel beaches, Type 2: Sand and gravel separated beaches, Type 3: Gravel and sand beaches, Type 4: Gravel and sand separated beaches, Type 5: Pure gravel beaches, Type 6: Open sand beaches, Type 7: Supported sand beaches, Type 8: Bisupported sand beaches and Type 9: Enclosed beaches. For the classification, several tools were used: discriminant analysis, neural networks and Support Vector Machines (SVM), the results were then compared. As there is no theory for deciding which is the most convenient neural network architecture to deal with a particular data set, an experimental study was performed with different numbers of neuron in the hidden layer. Finally, an architecture with 30 neurons was chosen. Different kernels were employed for SVM (Linear, Polynomial, Radial basis function and Sigmoid). The results obtained for the discriminant analysis were not as good as those obtained for the other two methods (ANN and SVM) which showed similar success.
Resumo:
In this work we present a semantic framework suitable of being used as support tool for recommender systems. Our purpose is to use the semantic information provided by a set of integrated resources to enrich texts by conducting different NLP tasks: WSD, domain classification, semantic similarities and sentiment analysis. After obtaining the textual semantic enrichment we would be able to recommend similar content or even to rate texts according to different dimensions. First of all, we describe the main characteristics of the semantic integrated resources with an exhaustive evaluation. Next, we demonstrate the usefulness of our resource in different NLP tasks and campaigns. Moreover, we present a combination of different NLP approaches that provide enough knowledge for being used as support tool for recommender systems. Finally, we illustrate a case of study with information related to movies and TV series to demonstrate that our framework works properly.
Resumo:
Due to confidentiality considerations, the microdata available from the 2011 Spanish Census have been codified at a provincial (NUTS 3) level except when the municipal (LAU 2) population exceeds 20,000 inhabitants (a requirement that is met by less than 5% of all municipalities). For the remainder of the municipalities within a given province, information is only provided for their classification in wide population intervals. These limitations, hampering territorially-focused socio-economic analyses, and more specifically, those related to the labour market, are observed in many other countries. This article proposes and demonstrates an automatic procedure aimed at delineating a set of areas that meet such population requirements and that may be used to re-codify the geographic reference in these cases, thereby increasing the territorial detail at which individual information is available. The method aggregates municipalities into clusters based on the optimisation of a relevant objective function subject to a number of statistical constraints, and is implemented using evolutionary computation techniques. Clusters are defined to fit outer boundaries at the level of labour market areas.