818 resultados para Machine learning,Keras,Tensorflow,Data parallelism,Model parallelism,Container,Docker
Resumo:
—Microarray-based global gene expression profiling, with the use of sophisticated statistical algorithms is providing new insights into the pathogenesis of autoimmune diseases. We have applied a novel statistical technique for gene selection based on machine learning approaches to analyze microarray expression data gathered from patients with systemic lupus erythematosus (SLE) and primary antiphospholipid syndrome (PAPS), two autoimmune diseases of unknown genetic origin that share many common features. The methodology included a combination of three data discretization policies, a consensus gene selection method, and a multivariate correlation measurement. A set of 150 genes was found to discriminate SLE and PAPS patients from healthy individuals. Statistical validations demonstrate the relevance of this gene set from an univariate and multivariate perspective. Moreover, functional characterization of these genes identified an interferon-regulated gene signature, consistent with previous reports. It also revealed the existence of other regulatory pathways, including those regulated by PTEN, TNF, and BCL-2, which are altered in SLE and PAPS. Remarkably, a significant number of these genes carry E2F binding motifs in their promoters, projecting a role for E2F in the regulation of autoimmunity.
Resumo:
This work explores the automatic recognition of physical activity intensity patterns from multi-axial accelerometry and heart rate signals. Data collection was carried out in free-living conditions and in three controlled gymnasium circuits, for a total amount of 179.80 h of data divided into: sedentary situations (65.5%), light-to-moderate activity (17.6%) and vigorous exercise (16.9%). The proposed machine learning algorithms comprise the following steps: time-domain feature definition, standardization and PCA projection, unsupervised clustering (by k-means and GMM) and a HMM to account for long-term temporal trends. Performance was evaluated by 30 runs of a 10-fold cross-validation. Both k-means and GMM-based approaches yielded high overall accuracy (86.97% and 85.03%, respectively) and, given the imbalance of the dataset, meritorious F-measures (up to 77.88%) for non-sedentary cases. Classification errors tended to be concentrated around transients, what constrains their practical impact. Hence, we consider our proposal to be suitable for 24 h-based monitoring of physical activity in ambulatory scenarios and a first step towards intensity-specific energy expenditure estimators
Resumo:
Pragmatism is the leading motivation of regularization. We can understand regularization as a modification of the maximum-likelihood estimator so that a reasonable answer could be given in an unstable or ill-posed situation. To mention some typical examples, this happens when fitting parametric or non-parametric models with more parameters than data or when estimating large covariance matrices. Regularization is usually used, in addition, to improve the bias-variance tradeoff of an estimation. Then, the definition of regularization is quite general, and, although the introduction of a penalty is probably the most popular type, it is just one out of multiple forms of regularization. In this dissertation, we focus on the applications of regularization for obtaining sparse or parsimonious representations, where only a subset of the inputs is used. A particular form of regularization, L1-regularization, plays a key role for reaching sparsity. Most of the contributions presented here revolve around L1-regularization, although other forms of regularization are explored (also pursuing sparsity in some sense). In addition to present a compact review of L1-regularization and its applications in statistical and machine learning, we devise methodology for regression, supervised classification and structure induction of graphical models. Within the regression paradigm, we focus on kernel smoothing learning, proposing techniques for kernel design that are suitable for high dimensional settings and sparse regression functions. We also present an application of regularized regression techniques for modeling the response of biological neurons. Supervised classification advances deal, on the one hand, with the application of regularization for obtaining a na¨ıve Bayes classifier and, on the other hand, with a novel algorithm for brain-computer interface design that uses group regularization in an efficient manner. Finally, we present a heuristic for inducing structures of Gaussian Bayesian networks using L1-regularization as a filter. El pragmatismo es la principal motivación de la regularización. Podemos entender la regularización como una modificación del estimador de máxima verosimilitud, de tal manera que se pueda dar una respuesta cuando la configuración del problema es inestable. A modo de ejemplo, podemos mencionar el ajuste de modelos paramétricos o no paramétricos cuando hay más parámetros que casos en el conjunto de datos, o la estimación de grandes matrices de covarianzas. Se suele recurrir a la regularización, además, para mejorar el compromiso sesgo-varianza en una estimación. Por tanto, la definición de regularización es muy general y, aunque la introducción de una función de penalización es probablemente el método más popular, éste es sólo uno de entre varias posibilidades. En esta tesis se ha trabajado en aplicaciones de regularización para obtener representaciones dispersas, donde sólo se usa un subconjunto de las entradas. En particular, la regularización L1 juega un papel clave en la búsqueda de dicha dispersión. La mayor parte de las contribuciones presentadas en la tesis giran alrededor de la regularización L1, aunque también se exploran otras formas de regularización (que igualmente persiguen un modelo disperso). Además de presentar una revisión de la regularización L1 y sus aplicaciones en estadística y aprendizaje de máquina, se ha desarrollado metodología para regresión, clasificación supervisada y aprendizaje de estructura en modelos gráficos. Dentro de la regresión, se ha trabajado principalmente en métodos de regresión local, proponiendo técnicas de diseño del kernel que sean adecuadas a configuraciones de alta dimensionalidad y funciones de regresión dispersas. También se presenta una aplicación de las técnicas de regresión regularizada para modelar la respuesta de neuronas reales. Los avances en clasificación supervisada tratan, por una parte, con el uso de regularización para obtener un clasificador naive Bayes y, por otra parte, con el desarrollo de un algoritmo que usa regularización por grupos de una manera eficiente y que se ha aplicado al diseño de interfaces cerebromáquina. Finalmente, se presenta una heurística para inducir la estructura de redes Bayesianas Gaussianas usando regularización L1 a modo de filtro.
Resumo:
Services in smart environments pursue to increase the quality of people?s lives. The most important issues when developing this kind of environments is testing and validating such services. These tasks usually imply high costs and annoying or unfeasible real-world testing. In such cases, artificial societies may be used to simulate the smart environment (i.e. physical environment, equipment and humans). With this aim, the CHROMUBE methodology guides test engineers when modeling human beings. Such models reproduce behaviors which are highly similar to the real ones. Originally, these models are based on automata whose transitions are governed by random variables. Automaton?s structure and the probability distribution functions of each random variable are determined by a manual test and error process. In this paper, it is presented an alternative extension of this methodology which avoids the said manual process. It is based on learning human behavior patterns automatically from sensor data by using machine learning techniques. The presented approach has been tested on a real scenario, where this extension has given highly accurate human behavior models,
Resumo:
El aprendizaje automático y la cienciometría son las disciplinas científicas que se tratan en esta tesis. El aprendizaje automático trata sobre la construcción y el estudio de algoritmos que puedan aprender a partir de datos, mientras que la cienciometría se ocupa principalmente del análisis de la ciencia desde una perspectiva cuantitativa. Hoy en día, los avances en el aprendizaje automático proporcionan las herramientas matemáticas y estadísticas para trabajar correctamente con la gran cantidad de datos cienciométricos almacenados en bases de datos bibliográficas. En este contexto, el uso de nuevos métodos de aprendizaje automático en aplicaciones de cienciometría es el foco de atención de esta tesis doctoral. Esta tesis propone nuevas contribuciones en el aprendizaje automático que podrían arrojar luz sobre el área de la cienciometría. Estas contribuciones están divididas en tres partes: Varios modelos supervisados (in)sensibles al coste son aprendidos para predecir el éxito científico de los artículos y los investigadores. Los modelos sensibles al coste no están interesados en maximizar la precisión de clasificación, sino en la minimización del coste total esperado derivado de los errores ocasionados. En este contexto, los editores de revistas científicas podrían disponer de una herramienta capaz de predecir el número de citas de un artículo en el fututo antes de ser publicado, mientras que los comités de promoción podrían predecir el incremento anual del índice h de los investigadores en los primeros años. Estos modelos predictivos podrían allanar el camino hacia nuevos sistemas de evaluación. Varios modelos gráficos probabilísticos son aprendidos para explotar y descubrir nuevas relaciones entre el gran número de índices bibliométricos existentes. En este contexto, la comunidad científica podría medir cómo algunos índices influyen en otros en términos probabilísticos y realizar propagación de la evidencia e inferencia abductiva para responder a preguntas bibliométricas. Además, la comunidad científica podría descubrir qué índices bibliométricos tienen mayor poder predictivo. Este es un problema de regresión multi-respuesta en el que el papel de cada variable, predictiva o respuesta, es desconocido de antemano. Los índices resultantes podrían ser muy útiles para la predicción, es decir, cuando se conocen sus valores, el conocimiento de cualquier valor no proporciona información sobre la predicción de otros índices bibliométricos. Un estudio bibliométrico sobre la investigación española en informática ha sido realizado bajo la cultura de publicar o morir. Este estudio se basa en una metodología de análisis de clusters que caracteriza la actividad en la investigación en términos de productividad, visibilidad, calidad, prestigio y colaboración internacional. Este estudio también analiza los efectos de la colaboración en la productividad y la visibilidad bajo diferentes circunstancias. ABSTRACT Machine learning and scientometrics are the scientific disciplines which are covered in this dissertation. Machine learning deals with the construction and study of algorithms that can learn from data, whereas scientometrics is mainly concerned with the analysis of science from a quantitative perspective. Nowadays, advances in machine learning provide the mathematical and statistical tools for properly working with the vast amount of scientometrics data stored in bibliographic databases. In this context, the use of novel machine learning methods in scientometrics applications is the focus of attention of this dissertation. This dissertation proposes new machine learning contributions which would shed light on the scientometrics area. These contributions are divided in three parts: Several supervised cost-(in)sensitive models are learned to predict the scientific success of articles and researchers. Cost-sensitive models are not interested in maximizing classification accuracy, but in minimizing the expected total cost of the error derived from mistakes in the classification process. In this context, publishers of scientific journals could have a tool capable of predicting the citation count of an article in the future before it is published, whereas promotion committees could predict the annual increase of the h-index of researchers within the first few years. These predictive models would pave the way for new assessment systems. Several probabilistic graphical models are learned to exploit and discover new relationships among the vast number of existing bibliometric indices. In this context, scientific community could measure how some indices influence others in probabilistic terms and perform evidence propagation and abduction inference for answering bibliometric questions. Also, scientific community could uncover which bibliometric indices have a higher predictive power. This is a multi-output regression problem where the role of each variable, predictive or response, is unknown beforehand. The resulting indices could be very useful for prediction purposes, that is, when their index values are known, knowledge of any index value provides no information on the prediction of other bibliometric indices. A scientometric study of the Spanish computer science research is performed under the publish-or-perish culture. This study is based on a cluster analysis methodology which characterizes the research activity in terms of productivity, visibility, quality, prestige and international collaboration. This study also analyzes the effects of collaboration on productivity and visibility under different circumstances.
Resumo:
El objetivo principal de este proyecto ha sido introducir aprendizaje automático en la aplicación FleSe. FleSe es una aplicación web que permite realizar consultas borrosas sobre bases de datos nítidos. Para llevar a cabo esta función la aplicación utiliza unos criterios para definir los conceptos borrosos usados para llevar a cabo las consultas. FleSe además permite que el usuario cambie estas personalizaciones. Es aquí donde introduciremos el aprendizaje automático, de tal manera que los criterios por defecto cambien y aprendan en función de las personalizaciones que van realizando los usuarios. Los objetivos secundarios han sido familiarizarse con el desarrollo y diseño web, al igual que recordar y ampliar el conocimiento sobre lógica borrosa y el lenguaje de programación lógica Ciao-Prolog. A lo largo de la realización del proyecto y sobre todo después del estudio de los resultados se demuestra que la agrupación de los usuarios marca la diferencia con la última versión de la aplicación. Esto se basa en la siguiente idea, podemos usar un algoritmo de aprendizaje automático sobre las personalizaciones de los criterios de todos los usuarios, pero la gran diversidad de opiniones de los usuarios puede llevar al algoritmo a concluir criterios erróneos o no representativos. Para solucionar este problema agrupamos a los usuarios intentando que cada grupo tengan la misma opinión o mismo criterio sobre el concepto. Y después de haber realizado las agrupaciones usar el algoritmo de aprendizaje automático para precisar el criterio por defecto de cada grupo de usuarios. Como posibles mejoras para futuras versiones de la aplicación FleSe sería un mejor control y manejo del ejecutable plserver. Este archivo se encarga de permitir a la aplicación web usar el lenguaje de programación lógica Ciao-Prolog para llevar a cabo la lógica borrosa relacionada con las consultas. Uno de los problemas más importantes que ofrece plserver es que bloquea el hilo de ejecución al intentar cargar un archivo con errores y en caso de ocurrir repetidas veces bloquea todas las peticiones siguientes bloqueando la aplicación. Pensando en los usuarios y posibles clientes, sería también importante permitir que FleSe trabajase con bases de datos de SQL en vez de almacenar la base de datos en los archivos de Prolog. Otra posible mejora basarse en distintas características a la hora de agrupar los usuarios dependiendo de los conceptos borrosos que se van ha utilizar en las consultas. Con esto se conseguiría que para cada concepto borroso, se generasen distintos grupos de usuarios, los cuales tendrían opiniones distintas sobre el concepto en cuestión. Así se generarían criterios por defecto más precisos para cada usuario y cada concepto borroso.---ABSTRACT---The main objective of this project has been to introduce machine learning in the application FleSe. FleSe is a web application that makes fuzzy queries over databases with precise information, using defined criteria to define the fuzzy concepts used by the queries. The application allows the users to change and custom these criteria. On this point is where the machine learning would be introduced, so FleSe learn from every new user customization of the criteria in order to generate a new default value of it. The secondary objectives of this project were get familiar with web development and web design in order to understand the how the application works, as well as refresh and improve the knowledge about fuzzy logic and logic programing. During the realization of the project and after the study of the results, I realized that clustering the users in different groups makes the difference between this new version of the application and the previous. This conclusion follows the next idea, we can use an algorithm to introduce machine learning over the criteria that people have, but the problem is the diversity of opinions and judgements that exists, making impossible to generate a unique correct criteria for all the users. In order to solve this problem, before using the machine learning methods, we cluster the users in order to make groups that have the same opinion, and afterwards, use the machine learning methods to precise the default criteria of each users group. The future improvements that could be important for the next versions of FleSe will be to control better the behaviour of the plserver file, that cost many troubles at the beginning of this project and it also generate important errors in the previous version. The file plserver allows the web application to use Ciao-Prolog, a logic programming language that control and manage all the fuzzy logic. One of the main problems with plserver is that when the user uploads a file with errors, it will block the thread and when this happens multiple times it will start blocking all the requests. Oriented to the customer, would be important as well to allow FleSe to manage and work with SQL databases instead of store the data in the Prolog files. Another possible improvement would that the cluster algorithm would be based on different criteria depending on the fuzzy concepts that the selected Prolog file have. This will generate more meaningful clusters, and therefore, the default criteria offered to the users will be more precise.
Resumo:
We present a machine learning-based system for automatically computing interpretable, quantitative measures of animal behavior. Through our interactive system, users encode their intuition about behavior by annotating a small set of video frames. These manual labels are converted into classifiers that can automatically annotate behaviors in screen-scale data sets. Our general-purpose system can create a variety of accurate individual and social behavior classifiers for different organisms, including mice and adult and larval Drosophila.
Resumo:
An emerging issue in the field of astronomy is the integration, management and utilization of databases from around the world to facilitate scientific discovery. In this paper, we investigate application of the machine learning techniques of support vector machines and neural networks to the problem of amalgamating catalogues of galaxies as objects from two disparate data sources: radio and optical. Formulating this as a classification problem presents several challenges, including dealing with a highly unbalanced data set. Unlike the conventional approach to the problem (which is based on a likelihood ratio) machine learning does not require density estimation and is shown here to provide a significant improvement in performance. We also report some experiments that explore the importance of the radio and optical data features for the matching problem.
Resumo:
The purpose of the work is to claim that engineers can be motivated to study statistical concepts by using the applications in their experience connected with Statistical ideas. The main idea is to choose a data from the manufacturing factility (for example, output from CMM machine) and explain that even if the parts used do not meet exact specifications they are used in production. By graphing the data one can show that the error is random but follows a distribution, that is, there is regularily in the data in statistical sense. As the error distribution is continuous, we advocate that the concept of randomness be introducted starting with continuous random variables with probabilities connected with areas under the density. The discrete random variables are then introduced in terms of decision connected with size of the errors before generalizing to abstract concept of probability. Using software, they can then be motivated to study statistical analysis of the data they encounter and the use of this analysis to make engineering and management decisions.
Resumo:
In machine learning, Gaussian process latent variable model (GP-LVM) has been extensively applied in the field of unsupervised dimensionality reduction. When some supervised information, e.g., pairwise constraints or labels of the data, is available, the traditional GP-LVM cannot directly utilize such supervised information to improve the performance of dimensionality reduction. In this case, it is necessary to modify the traditional GP-LVM to make it capable of handing the supervised or semi-supervised learning tasks. For this purpose, we propose a new semi-supervised GP-LVM framework under the pairwise constraints. Through transferring the pairwise constraints in the observed space to the latent space, the constrained priori information on the latent variables can be obtained. Under this constrained priori, the latent variables are optimized by the maximum a posteriori (MAP) algorithm. The effectiveness of the proposed algorithm is demonstrated with experiments on a variety of data sets. © 2010 Elsevier B.V.
Resumo:
Gun related violence is a complex issue and accounts for a large proportion of violent incidents. In the research reported in this paper, we set out to investigate the pro-gun and anti-gun sentiments expressed on a social media platform, namely Twitter, in response to the 2012 Sandy Hook Elementary School shooting in Connecticut, USA. Machine learning techniques are applied to classify a data corpus of over 700,000 tweets. The sentiments are captured using a public sentiment score that considers the volume of tweets as well as population. A web-based interactive tool is developed to visualise the sentiments and is available at this http://www.gunsontwitter.com. The key findings from this research are: (i) There are elevated rates of both pro-gun and anti-gun sentiments on the day of the shooting. Surprisingly, the pro-gun sentiment remains high for a number of days following the event but the anti-gun sentiment quickly falls to pre-event levels. (ii) There is a different public response from each state, with the highest pro-gun sentiment not coming from those with highest gun ownership levels but rather from California, Texas and New York.
Resumo:
Thesis (Ph.D.)--University of Washington, 2016-08
Resumo:
Evolutionary algorithms alone cannot solve optimization problems very efficiently since there are many random (not very rational) decisions in these algorithms. Combination of evolutionary algorithms and other techniques have been proven to be an efficient optimization methodology. In this talk, I will explain the basic ideas of our three algorithms along this line (1): Orthogonal genetic algorithm which treats crossover/mutation as an experimental design problem, (2) Multiobjective evolutionary algorithm based on decomposition (MOEA/D) which uses decomposition techniques from traditional mathematical programming in multiobjective optimization evolutionary algorithm, and (3) Regular model based multiobjective estimation of distribution algorithms (RM-MEDA) which uses the regular property and machine learning methods for improving multiobjective evolutionary algorithms.
Resumo:
Dissertação de Mestrado, Ciências da Linguagem, Faculdade de Ciências Humanas e Sociais, Universidade do Algarve, 2010