870 resultados para Semi-supervised segmentation


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data classification is a task with high applicability in a lot of areas. Most methods for treating classification problems found in the literature dealing with single-label or traditional problems. In recent years has been identified a series of classification tasks in which the samples can be labeled at more than one class simultaneously (multi-label classification). Additionally, these classes can be hierarchically organized (hierarchical classification and hierarchical multi-label classification). On the other hand, we have also studied a new category of learning, called semi-supervised learning, combining labeled data (supervised learning) and non-labeled data (unsupervised learning) during the training phase, thus reducing the need for a large amount of labeled data when only a small set of labeled samples is available. Thus, since both the techniques of multi-label and hierarchical multi-label classification as semi-supervised learning has shown favorable results with its use, this work is proposed and used to apply semi-supervised learning in hierarchical multi-label classication tasks, so eciently take advantage of the main advantages of the two areas. An experimental analysis of the proposed methods found that the use of semi-supervised learning in hierarchical multi-label methods presented satisfactory results, since the two approaches were statistically similar results

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. © 2012 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aneurysm diameter measurement is quick and easy, but suffers from the pitfalls of being "too rough and ready". When semi-automated segmentation took 7-10 minutes to estimate volume, it was not a practical tool for busy, routine clinical practice. Today, the availability of automatic segmentation in seconds is bound to make volume measurement, along with 3D ultrasonography, the tools of the future. There can be no debate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This work proposes an optimization of a semi-supervised Change Detection methodology based on a combination of Change Indices (CI) derived from an image multitemporal data set. For this purpose, SPOT 5 Panchromatic images with 2.5 m spatial resolution have been used, from which three Change Indices have been calculated. Two of them are usually known indices; however the third one has been derived considering the Kullbak-Leibler divergence. Then, these three indices have been combined forming a multiband image that has been used in as input for a Support Vector Machine (SVM) classifier where four different discriminant functions have been tested in order to differentiate between change and no_change categories. The performance of the suggested procedure has been assessed applying different quality measures, reaching in each case highly satisfactory values. These results have demonstrated that the simultaneous combination of basic change indices with others more sophisticated like the Kullback-Leibler distance, and the application of non-parametric discriminant functions like those employees in the SVM method, allows solving efficiently a change detection problem.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

El análisis de las diferentes alternativas en la planificación y diseño de corredores y trazados de carreteras debe basarse en la correcta definición de variables territoriales que sirvan como criterios para la toma de decisión y esto requiere un análisis ambiental preliminar de esas variables de calidad. En España, los estudios de viabilidad de nuevas carreteras y autovías están asociados a una fase del proceso de decisión que se corresponde con el denominado Estudio Informativo, el cual establece condicionantes físicos, ambientales, de uso del suelo y culturales que deben ser considerados en las primeras fases de la definición del trazado de un corredor de carretera. Así, la metodología más frecuente es establecer diferentes niveles de capacidad de acogida del territorio en el área de estudio con el fin de resumir las variables territoriales en mapas temáticos y facilitar el proceso de trazado de las alternativas de corredores de carretera. El paisaje es un factor limitante a tener en cuenta en la planificación y diseño de carreteras y, por tanto, deben buscarse trazados más sostenibles en relación con criterios estéticos y ecológicos del mismo. Pero este factor no es frecuentemente analizado en los Estudios Informativos e incluso, si es considerado, los estudios específicos de la calidad del paisaje (estético y ecológico) y de las formas del terreno no incorporan las recomendaciones de las guías de trazado para evitar o reducir los impactos en el paisaje. Además, los mapas de paisaje que se generan en este tipo de estudios no se corresponden con la escala de desarrollo del Estudio Informativo (1:5.000). Otro déficit común en planificación de corredores y trazados de carreteras es que no se tiene en cuenta la conectividad del paisaje durante el proceso de diseño de la carretera para prevenir la afección a los corredores de fauna existentes en el paisaje. Este déficit puede originar un posterior efecto barrera en los movimientos dispersivos de la fauna y la fragmentación de sus hábitats debido a la ocupación parcial o total de las teselas de hábitats con importancia biológica para la fauna (o hábitats focales) y a la interrupción de los corredores de fauna que concentran esos movimientos dispersivos de la fauna entre teselas. El objetivo principal de esta tesis es mejorar el estudio del paisaje para prevenir su afección durante el proceso de trazado de carreteras, facilitar la conservación de los corredores de fauna (o pasillos verdes) y la localización de medidas preventivas y correctoras en términos de selección y cuantificación de factores de idoneidad a fin de reducir los impactos visuales y ecológicos en el paisaje a escala local. Concretamente, la incorporación de valores cuantitativos y bien justificados en el proceso de decisión permite incrementar la transparencia en el proceso de diseño de corredores y trazados de carreteras. Con este fin, se han planteado cuatro preguntas específicas en esta investigación (1) ¿Cómo se seleccionan y evalúan los factores territoriales limitantes para localizar una nueva carretera por los profesionales españoles de planificación del territorio en relación con el paisaje? (2) ¿Cómo pueden ser definidos los corredores de fauna a partir de factores del paisaje que influyen en los movimientos dispersivos de la fauna? (3) ¿Cómo pueden delimitarse y evaluarse los corredores de fauna incluyendo el comportamiento parcialmente errático en los movimientos dispersivos de la fauna y el efecto barrera de los elementos antrópicos a una escala local? (4) ¿Qué y cómo las recomendaciones de diseño de carreteras relacionadas con el paisaje y las formas del terreno pueden ser incluidas en un modelo de Sistemas de Información Geográfica (SIG) para ayudar a los ingenieros civiles durante el proceso de diseño de un trazado de carreteras bajo el punto de vista de la sostenibilidad?. Esta tesis doctoral propone nuevas metodologías que mejoran el análisis visual y ecológico del paisaje utilizando indicadores y modelos SIG para obtener alternativas de trazado que produzcan un menor impacto en el paisaje. Estas metodologías fueron probadas en un paisaje heterogéneo con una alta tasa de densidad de corzo (Capreolus capreolus L.), uno de los grandes mamíferos más atropellados en la red de carreteras españolas, y donde está planificada la construcción de una nueva autovía que atravesará la mitad del área de distribución del corzo. Inicialmente, se han analizado las variables utilizadas en 22 estudios de proyectos de planificación de corredores de carreteras promovidos por el Ministerio de Fomento entre 2006 y 2008. Estas variables se agruparon según condicionantes físicos, ambientales, de usos del suelo y culturales con el fin de comparar los valores asignados de capacidad de acogida del territorio a cada variable en los diferentes estudios revisados. Posteriormente, y como etapa previa de un análisis de conectividad, se construyó un mapa de resistencia de los movimientos dispersivos del corzo en base a la literatura y al juicio de expertos. Usando esta investigación como base, se le asignó un valor de resistencia a cada factor seleccionado para construir la matriz de resistencia, ponderándolo y combinándolo con el resto de factores usando el proceso analítico jerárquico y los operadores de lógica difusa como métodos de análisis multicriterio. Posteriormente, se diseñó una metodología SIG para delimitar claramente la extensión física de los corredores de fauna de acuerdo a un valor umbral de ancho geométrico mínimo, así como la existencia de múltiples potenciales conexiones entre cada par de teselas de hábitats presentes en el paisaje estudiado. Finalmente, se realizó un procesado de datos Light Detection and Ranging (LiDAR) y un modelo SIG para calcular la calidad del paisaje (estético y ecológico), las formas del terreno que presentan características similares para trazar una carretera y la acumulación de vistas de potenciales conductores y observadores de los alrededores de la nueva vía. Las principales contribuciones de esta investigación al conocimiento científico existente en el campo de la evaluación del impacto ambiental en relación al diseño de corredores y trazados de carreteras son cuatro. Primero, el análisis realizado de 22 Estudios Informativos de planificación de carreteras reveló que los métodos aplicados por los profesionales para la evaluación de la capacidad de acogida del territorio no fue suficientemente estandarizada, ya que había una falta de uniformidad en el uso de fuentes cartográficas y en las metodologías de evaluación de la capacidad de acogida del territorio, especialmente en el análisis de la calidad del paisaje estético y ecológico. Segundo, el análisis realizado en esta tesis destaca la importancia de los métodos multicriterio para estructurar, combinar y validar factores que limitan los movimientos dispersivos de la fauna en el análisis de conectividad. Tercero, los modelos SIG desarrollados Generador de alternativas de corredores o Generator of Alternative Corridors (GAC) y Eliminador de Corredores Estrechos o Narrow Corridor Eraser (NCE) pueden ser aplicados sistemáticamente y sobre una base científica en análisis de conectividad como una mejora de las herramientas existentes para la comprensión el paisaje como una red compuesta por nodos y enlaces interconectados. Así, ejecutando los modelos GAC y NCE de forma iterativa, pueden obtenerse corredores alternativos con similar probabilidad de ser utilizados por la fauna y sin que éstos presenten cuellos de botella. Cuarto, el caso de estudio llevado a cabo de prediseño de corredores y trazado de una nueva autovía ha sido novedoso incluyendo una clasificación semisupervisada de las formas del terreno, filtrando una nube de puntos LiDAR e incluyendo la nueva geometría 3D de la carretera en el Modelo Digital de Superficie (MDS). El uso combinado del procesamiento de datos LiDAR y de índices y clasificaciones geomorfológicas puede ayudar a los responsables encargados en la toma de decisiones a evaluar qué alternativas de trazado causan el menor impacto en el paisaje, proporciona una visión global de los juicios de valor más aplicados y, en conclusión, define qué medidas de integración paisajística correctoras deben aplicarse y dónde. ABSTRACT The assessment of different alternatives in road-corridor planning and layout design must be based on a number of well-defined territorial variables that serve as decision-making criteria, and this requires a high-quality preliminary environmental analysis of those quality variables. In Spain, feasibility studies for new roads and motorways are associated to a phase of the decision procedure which corresponds with the one known as the Informative Study, which establishes the physical, environmental, land-use and cultural constraints to be considered in the early stages of defining road corridor layouts. The most common methodology is to establish different levels of Territorial Carrying Capacity (TCC) in the study area in order to summarize the territorial variables on thematic maps and facilitate the tracing process of road-corridor layout alternatives. Landscape is a constraint factor that must be considered in road planning and design, and the most sustainable layouts should be sought based on aesthetic and ecological criteria. However this factor is not often analyzed in Informative Studies and even if it is, baseline studies on landscape quality (aesthetic and ecological) and landforms do not usually include the recommendations of road tracing guides designed to avoid or reduce impacts on the landscape. The resolution of the landscape maps produced in this type of studies does not comply with the recommended road design scale (1:5,000) in the regulations for the Informative Study procedure. Another common shortcoming in road planning is that landscape ecological connectivity is not considered during road design in order to avoid affecting wildlife corridors in the landscape. In the prior road planning stage, this issue could lead to a major barrier effect for fauna dispersal movements and to the fragmentation of their habitat due to the partial or total occupation of habitat patches of biological importance for the fauna (or focal habitats), and the interruption of wildlife corridors that concentrate fauna dispersal movements between patches. The main goal of this dissertation is to improve the study of the landscape and prevent negative effects during the road tracing process, and facilitate the preservation of wildlife corridors (or green ways) and the location of preventive and corrective measures by selecting and quantifying suitability factors to reduce visual and ecological landscape impacts at a local scale. Specifically the incorporation of quantitative and well-supported values in the decision-making process provides increased transparency in the road corridors and layouts design process. Four specific questions were raised in this research: (1) How are territorial constraints selected and evaluated in terms of landscape by Spanish land-planning practitioners before locating a new road? (2) How can wildlife corridors be defined based on the landscape factors influencing the dispersal movements of fauna? (3) How can wildlife corridors be delimited and assessed to include the partially erratic movements of fauna and the barrier effect of the anthropic elements at a local scale? (4) How recommendations of road design related to landscape and landforms can be included in a Geographic Information System (GIS) model to aid civil engineers during the road layout design process and support sustainable development? This doctoral thesis proposes new methodologies that improve the assessment of the visual and ecological landscape character using indicators and GIS models to obtain road layout alternatives with a lower impact on the landscape. These methodologies were tested on a case study of a heterogeneous landscape with a high density of roe deer (Capreolus capreolus L.) –one of the large mammals most commonly hit by vehicles on the Spanish road network– and where a new motorway is planned to pass through the middle of their distribution area. We explored the variables used in 22 road-corridor planning projects sponsored by the Ministry of Public Works between 2006 and 2008. These variables were grouped into physical, environmental, land-use and cultural constraints for the purpose of comparing the TCC values assigned to each variable in the various studies reviewed. As a prior stage in a connectivity analysis, a map of resistance to roe deer dispersal movements was created based on the literature and experts judgment. Using this research as a base, each factor selected to build the matrix was assigned a resistance value and weighted and combined with the rest of the factors using the analytic hierarchy process (AHP) and fuzzy logic operators as multicriteria assessment (MCA) methods. A GIS methodology was designed to clearly delimit the physical area of wildlife corridors according to a geometric threshold width value, and the multiple potential connections between each pair of habitat patches in the landscape. A Digital Surface Model Light Detection and Ranging (LiDAR) dataset processing and a GIS model was performed to determine landscape quality (aesthetic and ecological) and landforms with similar characteristics for the road layout, and the cumulative viewshed of potential drivers and observers in the area surrounding the new motorway. The main contributions of this research to current scientific knowledge in the field of environmental impact assessment for road corridors and layouts design are four. First, the analysis of 22 Informative Studies on road planning revealed that the methods applied by practitioners for assessing the TCC were not sufficiently standardized due to the lack of uniformity in the cartographic information sources and the TCC valuation methodologies, especially in the analysis of the aesthetic and ecological quality of the landscape. Second, the analysis in this dissertation highlights the importance of multicriteria methods to structure, combine and validate factors that constrain wildlife dispersal movements in the connectivity analysis. Third, the “Generator of Alternative Corridors (GAC)” and “Narrow Corridor Eraser (NCE)” GIS models developed can be applied systematically and on a scientific basis in connectivity analyses to improve existing tools and understand landscape as a network composed of interconnected nodes and links. Thus, alternative corridors with similar probability of use by fauna and without bottlenecks can be obtained by iteratively running GAC and NCE models. Fourth, our case study of new motorway corridors and layouts design innovatively included semi-supervised classification of landforms, filtering of LiDAR point clouds and new 3D road geometry on the Digital Surface Model (DSM). The combined used of LiDAR data processing and geomorphological indices and classifications can help decision-makers assess which road layouts produce lower impacts on the landscape, provide an overall insight into the most commonly applied value judgments, and in conclusion, define which corrective measures should be applied in terms of landscaping, and where.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2014

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The popularity of Computing degrees in the UK has been increasing significantly over the past number of years. In Northern Ireland, from 2007 to 2015, there has been a 40% increase in acceptances to Computer Science degrees with England seeing a 60% increase over the same period (UCAS, 2016). However, this is tainted as Computer Science degrees also continue to maintain the highest dropout rates.
In Queen’s University Belfast we currently have a Level 1 intake of over 400 students across a number of computing pathways. Our drive as staff is to empower and motivate the students to fully engage with the course content. All students take a Java programming module the aim of which is to provide an understanding of the basic principles of object-oriented design. In order to assess these skills, we have developed Jigsaw Java as an innovative assessment tool offering intelligent, semi-supervised automated marking of code.
Jigsaw Java allows students to answer programming questions using a drag-and-drop interface to place code fragments into position. Their answer is compared to the sample solution and if it matches, marks are allocated accordingly. However, if a match is not found then the corresponding code is executed using sample data to determine if its logic is acceptable. If it is, the solution is flagged to be checked by staff and if satisfactory is saved as an alternative solution. This means that appropriate marks can be allocated and should another student have submitted the same placement of code fragments this does not need to be executed or checked again. Rather the system now knows how to assess it.
Jigsaw Java is also able to consider partial marks dependent on code placement and will “learn” over time. Given the number of students, Jigsaw Java will improve the consistency and timeliness of marking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Objective: determine the effect on the disability index of adult patients with benign paroxysmal positional vertigo (BPPV) using vestibular rehabilitation therapy (VRT) and human movement. Subjects: six subjects with an average age of 49.5 ± 14.22 years who have been diagnosed with benign paroxysmal positional vertigo by an otolaryngologist. Instruments: the Dizziness Handicap Inventory and a questionnaire to determine impact on the quality of life of patients with this pathology (Ceballos and Vargas, 2004). Procedure: subjects underwent vestibular therapy for four weeks together with habituation and balance exercises in a semi-supervised manner. Two measurements were performed, one before and one after the vestibular therapy and researchers determined if there was any improvement in the physical, functional, and emotional dimensions. Statistical analysis: descriptive statistics and Student’s t-test of repeated measures were applied to analyze results obtained. Results: significant statistical differences were found in the physical dimension between the pre-test (19.33 ± 4.67 points) and post-test (13 ± 7.24 points) (t = 2.65; p < 0.05).  In contrast, no significant statistical differences were found in the functional (t = 2.44; p>0.05), emotional (t = 2.37; p>0.05) or general dimensions (t = 2.55; p>0.05). Conclusion: vestibular therapy with a semi-supervised human movement program improved the index of disability due to vertigo (physical dimension) in BPPV subjects.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

L'entraînement sans surveillance efficace et inférence dans les modèles génératifs profonds reste un problème difficile. Une approche assez simple, la machine de Helmholtz, consiste à entraîner du haut vers le bas un modèle génératif dirigé qui sera utilisé plus tard pour l'inférence approximative. Des résultats récents suggèrent que de meilleurs modèles génératifs peuvent être obtenus par de meilleures procédures d'inférence approximatives. Au lieu d'améliorer la procédure d'inférence, nous proposons ici un nouveau modèle, la machine de Helmholtz bidirectionnelle, qui garantit qu'on peut calculer efficacement les distributions de haut-vers-bas et de bas-vers-haut. Nous y parvenons en interprétant à les modèles haut-vers-bas et bas-vers-haut en tant que distributions d'inférence approximative, puis ensuite en définissant la distribution du modèle comme étant la moyenne géométrique de ces deux distributions. Nous dérivons une borne inférieure pour la vraisemblance de ce modèle, et nous démontrons que l'optimisation de cette borne se comporte en régulisateur. Ce régularisateur sera tel que la distance de Bhattacharyya sera minisée entre les distributions approximatives haut-vers-bas et bas-vers-haut. Cette approche produit des résultats de pointe en terme de modèles génératifs qui favorisent les réseaux significativement plus profonds. Elle permet aussi une inférence approximative amérliorée par plusieurs ordres de grandeur. De plus, nous introduisons un modèle génératif profond basé sur les modèles BiHM pour l'entraînement semi-supervisé.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

L'entraînement sans surveillance efficace et inférence dans les modèles génératifs profonds reste un problème difficile. Une approche assez simple, la machine de Helmholtz, consiste à entraîner du haut vers le bas un modèle génératif dirigé qui sera utilisé plus tard pour l'inférence approximative. Des résultats récents suggèrent que de meilleurs modèles génératifs peuvent être obtenus par de meilleures procédures d'inférence approximatives. Au lieu d'améliorer la procédure d'inférence, nous proposons ici un nouveau modèle, la machine de Helmholtz bidirectionnelle, qui garantit qu'on peut calculer efficacement les distributions de haut-vers-bas et de bas-vers-haut. Nous y parvenons en interprétant à les modèles haut-vers-bas et bas-vers-haut en tant que distributions d'inférence approximative, puis ensuite en définissant la distribution du modèle comme étant la moyenne géométrique de ces deux distributions. Nous dérivons une borne inférieure pour la vraisemblance de ce modèle, et nous démontrons que l'optimisation de cette borne se comporte en régulisateur. Ce régularisateur sera tel que la distance de Bhattacharyya sera minisée entre les distributions approximatives haut-vers-bas et bas-vers-haut. Cette approche produit des résultats de pointe en terme de modèles génératifs qui favorisent les réseaux significativement plus profonds. Elle permet aussi une inférence approximative amérliorée par plusieurs ordres de grandeur. De plus, nous introduisons un modèle génératif profond basé sur les modèles BiHM pour l'entraînement semi-supervisé.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

One of the current frontiers in the clinical management of Pectus Excavatum (PE) patients is the prediction of the surgical outcome prior to the intervention. This can be done through computerized simulation of the Nuss procedure, which requires an anatomically correct representation of the costal cartilage. To this end, we take advantage of the costal cartilage tubular structure to detect it through multi-scale vesselness filtering. This information is then used in an interactive 2D initialization procedure which uses anatomical maximum intensity projections of 3D vesselness feature images to efficiently initialize the 3D segmentation process. We identify the cartilage tissue centerlines in these projected 2D images using a livewire approach. We finally refine the 3D cartilage surface through region-based sparse field level-sets. We have tested the proposed algorithm in 6 noncontrast CT datasets from PE patients. A good segmentation performance was found against reference manual contouring, with an average Dice coefficient of 0.75±0.04 and an average mean surface distance of 1.69±0.30mm. The proposed method requires roughly 1 minute for the interactive initialization step, which can positively contribute to an extended use of this tool in clinical practice, since current manual delineation of the costal cartilage can take up to an hour.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work presents an efficient method for volume rendering of glioma tumors from segmented 2D MRI Datasets with user interactive control, by replacing manual segmentation required in the state of art methods. The most common primary brain tumors are gliomas, evolving from the cerebral supportive cells. For clinical follow-up, the evaluation of the pre- operative tumor volume is essential. Tumor portions were automatically segmented from 2D MR images using morphological filtering techniques. These seg- mented tumor slices were propagated and modeled with the software package. The 3D modeled tumor consists of gray level values of the original image with exact tumor boundary. Axial slices of FLAIR and T2 weighted images were used for extracting tumors. Volumetric assessment of tumor volume with manual segmentation of its outlines is a time-consuming proc- ess and is prone to error. These defects are overcome in this method. Authors verified the performance of our method on several sets of MRI scans. The 3D modeling was also done using segmented 2D slices with the help of a medical software package called 3D DOCTOR for verification purposes. The results were validated with the ground truth models by the Radi- ologist.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis focuses on automating the time-consuming task of manually counting activated neurons in fluorescent microscopy images, which is used to study the mechanisms underlying torpor. The traditional method of manual annotation can introduce bias and delay the outcome of experiments, so the author investigates a deep-learning-based procedure to automatize this task. The author explores two of the main convolutional-neural-network (CNNs) state-of-the-art architectures: UNet and ResUnet family model, and uses a counting-by-segmentation strategy to provide a justification of the objects considered during the counting process. The author also explores a weakly-supervised learning strategy that exploits only dot annotations. The author quantifies the advantages in terms of data reduction and counting performance boost obtainable with a transfer-learning approach and, specifically, a fine-tuning procedure. The author released the dataset used for the supervised use case and all the pre-training models, and designed a web application to share both the counting process pipeline developed in this work and the models pre-trained on the dataset analyzed in this work.