790 resultados para Learning techniques
Resumo:
Nowadays the number of hip joints arthroplasty operations continues to increase because the elderly population is growing. Moreover, the global life expectancy is increasing and people adopt a more active way of life. For this reasons, the demand of implant revision operations is becoming more frequent. The operation procedure includes the surgical removal of the old implant and its substitution with a new one. Every time a new implant is inserted, it generates an alteration in the internal femur strain distribution, jeopardizing the remodeling process with the possibility of bone tissue loss. This is of major concern, particularly in the proximal Gruen zones, which are considered critical for implant stability and longevity. Today, different implant designs exist in the market; however there is not a clear understanding of which are the best implant design parameters to achieve mechanical optimal conditions. The aim of the study is to investigate the stress shielding effect generated by different implant design parameters on proximal femur, evaluating which ranges of those parameters lead to the most physiological conditions.
Resumo:
Important food crops like rice are constantly exposed to various stresses that can have devastating effect on their survival and productivity. Being sessile, these highly evolved organisms have developed elaborate molecular machineries to sense a mixture of stress signals and elicit a precise response to minimize the damage. However, recent discoveries revealed that the interplay of these stress regulatory and signaling molecules is highly complex and remains largely unknown. In this work, we conducted large scale analysis of differential gene expression using advanced computational methods to dissect regulation of stress response which is at the heart of all molecular changes leading to the observed phenotypic susceptibility. One of the most important stress conditions in terms of loss of productivity is drought. We performed genomic and proteomic analysis of epigenetic and miRNA mechanisms in regulation of drought responsive genes in rice and found subsets of genes with striking properties. Overexpressed genesets included higher number of epigenetic marks, miRNA targets and transcription factors which regulate drought tolerance. On the other hand, underexpressed genesets were poor in above features but were rich in number of metabolic genes with multiple co-expression partners contributing majorly towards drought resistance. Identification and characterization of the patterns exhibited by differentially expressed genes hold key to uncover the synergistic and antagonistic components of the cross talk between stress response mechanisms. We performed meta-analysis on drought and bacterial stresses in rice and Arabidopsis, and identified hundreds of shared genes. We found high level of conservation of gene expression between these stresses. Weighted co-expression network analysis detected two tight clusters of genes made up of master transcription factors and signaling genes showing strikingly opposite expression status. To comprehensively identify the shared stress responsive genes between multiple abiotic and biotic stresses in rice, we performed meta-analyses of microarray studies from seven different abiotic and six biotic stresses separately and found more than thirteen hundred shared stress responsive genes. Various machine learning techniques utilizing these genes classified the stresses into two major classes' namely abiotic and biotic stresses and multiple classes of individual stresses with high accuracy and identified the top genes showing distinct patterns of expression. Functional enrichment and co-expression network analysis revealed the different roles of plant hormones, transcription factors in conserved and non-conserved genesets in regulation of stress response.
Resumo:
BACKGROUND Clinical prognostic groupings for localised prostate cancers are imprecise, with 30-50% of patients recurring after image-guided radiotherapy or radical prostatectomy. We aimed to test combined genomic and microenvironmental indices in prostate cancer to improve risk stratification and complement clinical prognostic factors. METHODS We used DNA-based indices alone or in combination with intra-prostatic hypoxia measurements to develop four prognostic indices in 126 low-risk to intermediate-risk patients (Toronto cohort) who will receive image-guided radiotherapy. We validated these indices in two independent cohorts of 154 (Memorial Sloan Kettering Cancer Center cohort [MSKCC] cohort) and 117 (Cambridge cohort) radical prostatectomy specimens from low-risk to high-risk patients. We applied unsupervised and supervised machine learning techniques to the copy-number profiles of 126 pre-image-guided radiotherapy diagnostic biopsies to develop prognostic signatures. Our primary endpoint was the development of a set of prognostic measures capable of stratifying patients for risk of biochemical relapse 5 years after primary treatment. FINDINGS Biochemical relapse was associated with indices of tumour hypoxia, genomic instability, and genomic subtypes based on multivariate analyses. We identified four genomic subtypes for prostate cancer, which had different 5-year biochemical relapse-free survival. Genomic instability is prognostic for relapse in both image-guided radiotherapy (multivariate analysis hazard ratio [HR] 4·5 [95% CI 2·1-9·8]; p=0·00013; area under the receiver operator curve [AUC] 0·70 [95% CI 0·65-0·76]) and radical prostatectomy (4·0 [1·6-9·7]; p=0·0024; AUC 0·57 [0·52-0·61]) patients with prostate cancer, and its effect is magnified by intratumoral hypoxia (3·8 [1·2-12]; p=0·019; AUC 0·67 [0·61-0·73]). A novel 100-loci DNA signature accurately classified treatment outcome in the MSKCC low-risk to intermediate-risk cohort (multivariate analysis HR 6·1 [95% CI 2·0-19]; p=0·0015; AUC 0·74 [95% CI 0·65-0·83]). In the independent MSKCC and Cambridge cohorts, this signature identified low-risk to high-risk patients who were most likely to fail treatment within 18 months (combined cohorts multivariate analysis HR 2·9 [95% CI 1·4-6·0]; p=0·0039; AUC 0·68 [95% CI 0·63-0·73]), and was better at predicting biochemical relapse than 23 previously published RNA signatures. INTERPRETATION This is the first study of cancer outcome to integrate DNA-based and microenvironment-based failure indices to predict patient outcome. Patients exhibiting these aggressive features after biopsy should be entered into treatment intensification trials. FUNDING Movember Foundation, Prostate Cancer Canada, Ontario Institute for Cancer Research, Canadian Institute for Health Research, NIHR Cambridge Biomedical Research Centre, The University of Cambridge, Cancer Research UK, Cambridge Cancer Charity, Prostate Cancer UK, Hutchison Whampoa Limited, Terry Fox Research Institute, Princess Margaret Cancer Centre Foundation, PMH-Radiation Medicine Program Academic Enrichment Fund, Motorcycle Ride for Dad (Durham), Canadian Cancer Society.
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network
Resumo:
Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2×2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance (~85.5%) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists.
Resumo:
Problem: Medical and veterinary students memorize facts but then have difficulty applying those facts in clinical problem solving. Cognitive engineering research suggests that the inability of medical and veterinary students to infer concepts from facts may be due in part to specific features of how information is represented and organized in educational materials. First, physical separation of pieces of information may increase the cognitive load on the student. Second, information that is necessary but not explicitly stated may also contribute to the student’s cognitive load. Finally, the types of representations – textual or graphical – may also support or hinder the student’s learning process. This may explain why students have difficulty applying biomedical facts in clinical problem solving. Purpose: To test the hypothesis that three specific aspects of expository text – the patial distance between the facts needed to infer a rule, the explicitness of information, and the format of representation – affected the ability of students to solve clinical problems. Setting: The study was conducted in the parasitology laboratory of a college of veterinary medicine in Texas. Sample: The study subjects were a convenience sample consisting of 132 second-year veterinary students who matriculated in 2007. The age of this class upon admission ranged from 20-52, and the gender makeup of this class consisted of approximately 75% females and 25% males. Results: No statistically significant difference in student ability to solve clinical problems was found when relevant facts were placed in proximity, nor when an explicit rule was stated. Further, no statistically significant difference in student ability to solve clinical problems was found when students were given different representations of material, including tables and concept maps. Findings: The findings from this study indicate that the three properties investigated – proximity, explicitness, and representation – had no statistically significant effect on student learning as it relates to clinical problem-solving ability. However, ad hoc observations as well as findings from other researchers suggest that the subjects were probably using rote learning techniques such as memorization, and therefore were not attempting to infer relationships from the factual material in the interventions, unless they were specifically prompted to look for patterns. A serendipitous finding unrelated to the study hypothesis was that those subjects who correctly answered questions regarding functional (non-morphologic) properties, such as mode of transmission and intermediate host, at the family taxonomic level were significantly more likely to correctly answer clinical case scenarios than were subjects who did not correctly answer questions regarding functional properties. These findings suggest a strong relationship (p < .001) between well-organized knowledge of taxonomic functional properties and clinical problem solving ability. Recommendations: Further study should be undertaken investigating the relationship between knowledge of functional taxonomic properties and clinical problem solving ability. In addition, the effect of prompting students to look for patterns in instructional material, followed by the effect of factors that affect cognitive load such as proximity, explicitness, and representation, should be explored.
Resumo:
Training and assessment paradigms for laparoscopic surgical skills are evolving from traditional mentor–trainee tutorship towards structured, more objective and safer programs. Accreditation of surgeons requires reaching a consensus on metrics and tasks used to assess surgeons’ psychomotor skills. Ongoing development of tracking systems and software solutions has allowed for the expansion of novel training and assessment means in laparoscopy. The current challenge is to adapt and include these systems within training programs, and to exploit their possibilities for evaluation purposes. This paper describes the state of the art in research on measuring and assessing psychomotor laparoscopic skills. It gives an overview on tracking systems as well as on metrics and advanced statistical and machine learning techniques employed for evaluation purposes. The later ones have a potential to be used as an aid in deciding on the surgical competence level, which is an important aspect when accreditation of the surgeons in particular, and patient safety in general, are considered. The prospective of these methods and tools make them complementary means for surgical assessment of motor skills, especially in the early stages of training. Successful examples such as the Fundamentals of Laparoscopic Surgery should help drive a paradigm change to structured curricula based on objective parameters. These may improve the accreditation of new surgeons, as well as optimize their already overloaded training schedules.
Resumo:
This paper describes a stress detection system based on fuzzy logic and two physiological signals: Galvanic Skin Response and Heart Rate. Instead of providing a global stress classification, this approach creates an individual stress templates, gathering the behaviour of individuals under situations with different degrees of stress. The proposed method is able to detect stress properly with a rate of 99.5%, being evaluated with a database of 80 individuals. This result improves former approaches in the literature and well-known machine learning techniques like SVM, k-NN, GMM and Linear Discriminant Analysis. Finally, the proposed method is highly suitable for real-time applications
Resumo:
Semantic Web aims to allow machines to make inferences using the explicit conceptualisations contained in ontologies. By pointing to ontologies, Semantic Web-based applications are able to inter-operate and share common information easily. Nevertheless, multilingual semantic applications are still rare, owing to the fact that most online ontologies are monolingual in English. In order to solve this issue, techniques for ontology localisation and translation are needed. However, traditional machine translation is difficult to apply to ontologies, owing to the fact that ontology labels tend to be quite short in length and linguistically different from the free text paradigm. In this paper, we propose an approach to enhance machine translation of ontologies based on exploiting the well-structured concept descriptions contained in the ontology. In particular, our approach leverages the semantics contained in the ontology by using Cross Lingual Explicit Semantic Analysis (CLESA) for context-based disambiguation in phrase-based Statistical Machine Translation (SMT). The presented work is novel in the sense that application of CLESA in SMT has not been performed earlier to the best of our knowledge.
Resumo:
Machine learning techniques are used for extracting valuable knowledge from data. Nowa¬days, these techniques are becoming even more important due to the evolution in data ac¬quisition and storage, which is leading to data with different characteristics that must be exploited. Therefore, advances in data collection must be accompanied with advances in machine learning techniques to solve new challenges that might arise, on both academic and real applications. There are several machine learning techniques depending on both data characteristics and purpose. Unsupervised classification or clustering is one of the most known techniques when data lack of supervision (unlabeled data) and the aim is to discover data groups (clusters) according to their similarity. On the other hand, supervised classification needs data with supervision (labeled data) and its aim is to make predictions about labels of new data. The presence of data labels is a very important characteristic that guides not only the learning task but also other related tasks such as validation. When only some of the available data are labeled whereas the others remain unlabeled (partially labeled data), neither clustering nor supervised classification can be used. This scenario, which is becoming common nowadays because of labeling process ignorance or cost, is tackled with semi-supervised learning techniques. This thesis focuses on the branch of semi-supervised learning closest to clustering, i.e., to discover clusters using available labels as support to guide and improve the clustering process. Another important data characteristic, different from the presence of data labels, is the relevance or not of data features. Data are characterized by features, but it is possible that not all of them are relevant, or equally relevant, for the learning process. A recent clustering tendency, related to data relevance and called subspace clustering, claims that different clusters might be described by different feature subsets. This differs from traditional solutions to data relevance problem, where a single feature subset (usually the complete set of original features) is found and used to perform the clustering process. The proximity of this work to clustering leads to the first goal of this thesis. As commented above, clustering validation is a difficult task due to the absence of data labels. Although there are many indices that can be used to assess the quality of clustering solutions, these validations depend on clustering algorithms and data characteristics. Hence, in the first goal three known clustering algorithms are used to cluster data with outliers and noise, to critically study how some of the most known validation indices behave. The main goal of this work is however to combine semi-supervised clustering with subspace clustering to obtain clustering solutions that can be correctly validated by using either known indices or expert opinions. Two different algorithms are proposed from different points of view to discover clusters characterized by different subspaces. For the first algorithm, available data labels are used for searching for subspaces firstly, before searching for clusters. This algorithm assigns each instance to only one cluster (hard clustering) and is based on mapping known labels to subspaces using supervised classification techniques. Subspaces are then used to find clusters using traditional clustering techniques. The second algorithm uses available data labels to search for subspaces and clusters at the same time in an iterative process. This algorithm assigns each instance to each cluster based on a membership probability (soft clustering) and is based on integrating known labels and the search for subspaces into a model-based clustering approach. The different proposals are tested using different real and synthetic databases, and comparisons to other methods are also included when appropriate. Finally, as an example of real and current application, different machine learning tech¬niques, including one of the proposals of this work (the most sophisticated one) are applied to a task of one of the most challenging biological problems nowadays, the human brain model¬ing. Specifically, expert neuroscientists do not agree with a neuron classification for the brain cortex, which makes impossible not only any modeling attempt but also the day-to-day work without a common way to name neurons. Therefore, machine learning techniques may help to get an accepted solution to this problem, which can be an important milestone for future research in neuroscience. Resumen Las técnicas de aprendizaje automático se usan para extraer información valiosa de datos. Hoy en día, la importancia de estas técnicas está siendo incluso mayor, debido a que la evolución en la adquisición y almacenamiento de datos está llevando a datos con diferentes características que deben ser explotadas. Por lo tanto, los avances en la recolección de datos deben ir ligados a avances en las técnicas de aprendizaje automático para resolver nuevos retos que pueden aparecer, tanto en aplicaciones académicas como reales. Existen varias técnicas de aprendizaje automático dependiendo de las características de los datos y del propósito. La clasificación no supervisada o clustering es una de las técnicas más conocidas cuando los datos carecen de supervisión (datos sin etiqueta), siendo el objetivo descubrir nuevos grupos (agrupaciones) dependiendo de la similitud de los datos. Por otra parte, la clasificación supervisada necesita datos con supervisión (datos etiquetados) y su objetivo es realizar predicciones sobre las etiquetas de nuevos datos. La presencia de las etiquetas es una característica muy importante que guía no solo el aprendizaje sino también otras tareas relacionadas como la validación. Cuando solo algunos de los datos disponibles están etiquetados, mientras que el resto permanece sin etiqueta (datos parcialmente etiquetados), ni el clustering ni la clasificación supervisada se pueden utilizar. Este escenario, que está llegando a ser común hoy en día debido a la ignorancia o el coste del proceso de etiquetado, es abordado utilizando técnicas de aprendizaje semi-supervisadas. Esta tesis trata la rama del aprendizaje semi-supervisado más cercana al clustering, es decir, descubrir agrupaciones utilizando las etiquetas disponibles como apoyo para guiar y mejorar el proceso de clustering. Otra característica importante de los datos, distinta de la presencia de etiquetas, es la relevancia o no de los atributos de los datos. Los datos se caracterizan por atributos, pero es posible que no todos ellos sean relevantes, o igualmente relevantes, para el proceso de aprendizaje. Una tendencia reciente en clustering, relacionada con la relevancia de los datos y llamada clustering en subespacios, afirma que agrupaciones diferentes pueden estar descritas por subconjuntos de atributos diferentes. Esto difiere de las soluciones tradicionales para el problema de la relevancia de los datos, en las que se busca un único subconjunto de atributos (normalmente el conjunto original de atributos) y se utiliza para realizar el proceso de clustering. La cercanía de este trabajo con el clustering lleva al primer objetivo de la tesis. Como se ha comentado previamente, la validación en clustering es una tarea difícil debido a la ausencia de etiquetas. Aunque existen muchos índices que pueden usarse para evaluar la calidad de las soluciones de clustering, estas validaciones dependen de los algoritmos de clustering utilizados y de las características de los datos. Por lo tanto, en el primer objetivo tres conocidos algoritmos se usan para agrupar datos con valores atípicos y ruido para estudiar de forma crítica cómo se comportan algunos de los índices de validación más conocidos. El objetivo principal de este trabajo sin embargo es combinar clustering semi-supervisado con clustering en subespacios para obtener soluciones de clustering que puedan ser validadas de forma correcta utilizando índices conocidos u opiniones expertas. Se proponen dos algoritmos desde dos puntos de vista diferentes para descubrir agrupaciones caracterizadas por diferentes subespacios. Para el primer algoritmo, las etiquetas disponibles se usan para bus¬car en primer lugar los subespacios antes de buscar las agrupaciones. Este algoritmo asigna cada instancia a un único cluster (hard clustering) y se basa en mapear las etiquetas cono-cidas a subespacios utilizando técnicas de clasificación supervisada. El segundo algoritmo utiliza las etiquetas disponibles para buscar de forma simultánea los subespacios y las agru¬paciones en un proceso iterativo. Este algoritmo asigna cada instancia a cada cluster con una probabilidad de pertenencia (soft clustering) y se basa en integrar las etiquetas conocidas y la búsqueda en subespacios dentro de clustering basado en modelos. Las propuestas son probadas utilizando diferentes bases de datos reales y sintéticas, incluyendo comparaciones con otros métodos cuando resulten apropiadas. Finalmente, a modo de ejemplo de una aplicación real y actual, se aplican diferentes técnicas de aprendizaje automático, incluyendo una de las propuestas de este trabajo (la más sofisticada) a una tarea de uno de los problemas biológicos más desafiantes hoy en día, el modelado del cerebro humano. Específicamente, expertos neurocientíficos no se ponen de acuerdo en una clasificación de neuronas para la corteza cerebral, lo que imposibilita no sólo cualquier intento de modelado sino también el trabajo del día a día al no tener una forma estándar de llamar a las neuronas. Por lo tanto, las técnicas de aprendizaje automático pueden ayudar a conseguir una solución aceptada para este problema, lo cual puede ser un importante hito para investigaciones futuras en neurociencia.
Resumo:
The presented work proposes a new approach for anomaly detection. This approach is based on changes in a population of evolving agents under stress. If conditions are appropriate, changes in the population (modeled by the bioindicators) are representative of the alterations to the environment. This approach, based on an ecological view, improves functionally traditional approaches to the detection of anomalies. To verify this assertion, experiments based on Network Intrussion Detection Systems are presented. The results are compared with the behaviour of other bioinspired approaches and machine learning techniques.
Resumo:
Mobile activity recognition focuses on inferring the current activities of a mobile user by leveraging the sensory data that is available on today’s smart phones. The state of the art in mobile activity recognition uses traditional classification learning techniques. Thus, the learning process typically involves: i) collection of labelled sensory data that is transferred and collated in a centralised repository; ii) model building where the classification model is trained and tested using the collected data; iii) a model deployment stage where the learnt model is deployed on-board a mobile device for identifying activities based on new sensory data. In this paper, we demonstrate the Mobile Activity Recognition System (MARS) where for the first time the model is built and continuously updated on-board the mobile device itself using data stream mining. The advantages of the on-board approach are that it allows model personalisation and increased privacy as the data is not sent to any external site. Furthermore, when the user or its activity profile changes MARS enables promptly adaptation. MARS has been implemented on the Android platform to demonstrate that it can achieve accurate mobile activity recognition. Moreover, we can show in practise that MARS quickly adapts to user profile changes while at the same time being scalable and efficient in terms of consumption of the device resources.
Resumo:
Atrial fibrillation (AF) is a common heart disorder. One of the most prominent hypothesis about its initiation and maintenance considers multiple uncoordinated activation foci inside the atrium. However, the implicit assumption behind all the signal processing techniques used for AF, such as dominant frequency and organization analysis, is the existence of a single regular component in the observed signals. In this paper we take into account the existence of multiple foci, performing a spectral analysis to detect their number and frequencies. In order to obtain a cleaner signal on which the spectral analysis can be performed, we introduce sparsity-aware learning techniques to infer the spike trains corresponding to the activations. The good performance of the proposed algorithm is demonstrated both on synthetic and real data. RESUMEN. Algoritmo basado en técnicas de regresión dispersa para la extracción de las señales cardiacas en pacientes con fibrilación atrial (AF).
Resumo:
The project arises from the need to develop improved teaching methodologies in field of the mechanics of continuous media. The objective is to offer the student a learning process to acquire the necessary theoretical knowledge, cognitive skills and the responsibility and autonomy to professional development in this area. Traditionally the teaching of the concepts of these subjects was performed through lectures and laboratory practice. During these lessons the students attitude was usually passive, and therefore their effectiveness was poor. The proposed methodology has already been successfully employed in universities like University Bochum, Germany, University the South Australia and aims to improve the effectiveness of knowledge acquisition through use by the student of a virtual laboratory. This laboratory allows to adapt the curricula and learning techniques to the European Higher Education and improve current learning processes in the University School of Public Works Engineers -EUITOP- of the Technical University of Madrid -UPM-, due there are not laboratories in this specialization. The virtual space is created using a software platform built on OpenSim, manages 3D virtual worlds, and, language LSL -Linden Scripting Language-, which imprints specific powers to objects. The student or user can access this virtual world through their avatar -your character in the virtual world- and can perform practices within the space created for the purpose, at any time, just with computer with internet access and viewfinder. The virtual laboratory has three partitions. The virtual meeting rooms, where the avatar can interact with peers, solve problems and exchange existing documentation in the virtual library. The interactive game room, where the avatar is has to resolve a number of issues in time. And the video room where students can watch instructional videos and receive group lessons. Each audiovisual interactive element is accompanied by explanations framing it within the area of knowledge and enables students to begin to acquire a vocabulary and practice of the profession for which they are being formed. Plane elasticity concepts are introduced from the tension and compression testing of test pieces of steel and concrete. The behavior of reticulated and articulated structures is reinforced by some interactive games and concepts of tension, compression, local and global buckling will by tests to break articulated structures. Pure bending concepts, simple and composite torsion will be studied by observing a flexible specimen. Earthquake resistant design of buildings will be checked by a laboratory test video.
Resumo:
La minería de datos es un campo de las ciencias de la computación referido al proceso que intenta descubrir patrones en grandes volúmenes de datos. La minería de datos busca generar información similar a la que podría producir un experto humano. Además es el proceso de descubrir conocimientos interesantes, como patrones, asociaciones, cambios, anomalías y estructuras significativas a partir de grandes cantidades de datos almacenadas en bases de datos, data warehouses o cualquier otro medio de almacenamiento de información. El aprendizaje automático o aprendizaje de máquinas es una rama de la Inteligencia artificial cuyo objetivo es desarrollar técnicas que permitan a las computadoras aprender. De forma más concreta, se trata de crear programas capaces de generalizar comportamientos a partir de una información no estructurada suministrada en forma de ejemplos. La minería de datos utiliza métodos de aprendizaje automático para descubrir y enumerar patrones presentes en los datos. En los últimos años se han aplicado las técnicas de clasificación y aprendizaje automático en un número elevado de ámbitos como el sanitario, comercial o de seguridad. Un ejemplo muy actual es la detección de comportamientos y transacciones fraudulentas en bancos. Una aplicación de interés es el uso de las técnicas desarrolladas para la detección de comportamientos fraudulentos en la identificación de usuarios existentes en el interior de entornos inteligentes sin necesidad de realizar un proceso de autenticación. Para comprobar que estas técnicas son efectivas durante la fase de análisis de una determinada solución, es necesario crear una plataforma que de soporte al desarrollo, validación y evaluación de algoritmos de aprendizaje y clasificación en los entornos de aplicación bajo estudio. El proyecto planteado está definido para la creación de una plataforma que permita evaluar algoritmos de aprendizaje automático como mecanismos de identificación en espacios inteligentes. Se estudiarán tanto los algoritmos propios de este tipo de técnicas como las plataformas actuales existentes para definir un conjunto de requisitos específicos de la plataforma a desarrollar. Tras el análisis se desarrollará parcialmente la plataforma. Tras el desarrollo se validará con pruebas de concepto y finalmente se verificará en un entorno de investigación a definir. ABSTRACT. The data mining is a field of the sciences of the computation referred to the process that it tries to discover patterns in big volumes of information. The data mining seeks to generate information similar to the one that a human expert might produce. In addition it is the process of discovering interesting knowledge, as patterns, associations, changes, abnormalities and significant structures from big quantities of information stored in databases, data warehouses or any other way of storage of information. The machine learning is a branch of the artificial Intelligence which aim is to develop technologies that they allow the computers to learn. More specifically, it is a question of creating programs capable of generalizing behaviors from not structured information supplied in the form of examples. The data mining uses methods of machine learning to discover and to enumerate present patterns in the information. In the last years there have been applied classification and machine learning techniques in a high number of areas such as healthcare, commercial or security. A very current example is the detection of behaviors and fraudulent transactions in banks. An application of interest is the use of the techniques developed for the detection of fraudulent behaviors in the identification of existing Users inside intelligent environments without need to realize a process of authentication. To verify these techniques are effective during the phase of analysis of a certain solution, it is necessary to create a platform that support the development, validation and evaluation of algorithms of learning and classification in the environments of application under study. The project proposed is defined for the creation of a platform that allows evaluating algorithms of machine learning as mechanisms of identification in intelligent spaces. There will be studied both the own algorithms of this type of technologies and the current existing platforms to define a set of specific requirements of the platform to develop. After the analysis the platform will develop partially. After the development it will be validated by prove of concept and finally verified in an environment of investigation that would be define.
Resumo:
Since the beginning of Internet, Internet Service Providers (ISP) have seen the need of giving to users? traffic different treatments defined by agree- ments between ISP and customers. This procedure, known as Quality of Service Management, has not much changed in the last years (DiffServ and Deep Pack-et Inspection have been the most chosen mechanisms). However, the incremen-tal growth of Internet users and services jointly with the application of recent Ma- chine Learning techniques, open up the possibility of going one step for-ward in the smart management of network traffic. In this paper, we first make a survey of current tools and techniques for QoS Management. Then we intro-duce clustering and classifying Machine Learning techniques for traffic charac-terization and the concept of Quality of Experience. Finally, with all these com-ponents, we present a brand new framework that will manage in a smart way Quality of Service in a telecom Big Data based scenario, both for mobile and fixed communications.