856 resultados para regression algorithm
Resumo:
Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.
Resumo:
Coherent anti-Stokes Raman scattering is the powerful method of laser spectroscopy in which significant successes are achieved. However, the non-linear nature of CARS complicates the analysis of the received spectra. The objective of this Thesis is to develop a new phase retrieval algorithm for CARS. It utilizes the maximum entropy method and the new wavelet approach for spectroscopic background correction of a phase function. The method was developed to be easily automated and used on a large number of spectra of different substances.. The algorithm was successfully tested on experimental data.
Resumo:
BACKGROUND: Delirium and frailty - both potentially reversible geriatric syndromes - are seldom studied together, although they often occur jointly in older patients discharged from hospitals. This study aimed to explore the relationship between delirium and frailty in older adults discharged from hospitals. METHODS: Of the 221 patients aged >65 years, who were invited to participate, only 114 gave their consent to participate in this study. Delirium was assessed using the confusion assessment method, in which patients were classified dichotomously as delirious or nondelirious according to its algorithm. Frailty was assessed using the Edmonton Frailty Scale, which classifies patients dichotomously as frail or nonfrail. In addition to the sociodemographic characteristics, covariates such as scores from the Mini-Mental State Examination, Instrumental Activities of Daily Living scale, and Cumulative Illness Rating Scale for Geriatrics and details regarding polymedication were collected. A multidimensional linear regression model was used for analysis. RESULTS: Almost 20% of participants had delirium (n=22), and 76.3% were classified as frail (n=87); 31.5% of the variance in the delirium score was explained by frailty (R (2)=0.315). Age; polymedication; scores of the Confusion Assessment Method (CAM), instrumental activities of daily living, and Cumulative Illness Rating Scale for Geriatrics; and frailty increased the predictability of the variance of delirium by 32% to 64% (R (2)=0.64). CONCLUSION: Frailty is strongly related to delirium in older patients after discharge from the hospital.
Resumo:
Fetal MRI reconstruction aims at finding a high-resolution image given a small set of low-resolution images. It is usually modeled as an inverse problem where the regularization term plays a central role in the reconstruction quality. Literature has considered several regularization terms s.a. Dirichlet/Laplacian energy [1], Total Variation (TV)based energies [2,3] and more recently non-local means [4]. Although TV energies are quite attractive because of their ability in edge preservation, standard explicit steepest gradient techniques have been applied to optimize fetal-based TV energies. The main contribution of this work lies in the introduction of a well-posed TV algorithm from the point of view of convex optimization. Specifically, our proposed TV optimization algorithm for fetal reconstruction is optimal w.r.t. the asymptotic and iterative convergence speeds O(1/n(2)) and O(1/root epsilon), while existing techniques are in O(1/n) and O(1/epsilon). We apply our algorithm to (1) clinical newborn data, considered as ground truth, and (2) clinical fetal acquisitions. Our algorithm compares favorably with the literature in terms of speed and accuracy.
Resumo:
This paper describes Question Waves, an algorithm that can be applied to social search protocols, such as Asknext or Sixearch. In this model, the queries are propagated through the social network, with faster propagation through more trustable acquaintances. Question Waves uses local information to make decisions and obtain an answer ranking. With Question Waves, the answers that arrive first are the most likely to be relevant, and we computed the correlation of answer relevance with the order of arrival to demonstrate this result. We obtained correlations equivalent to the heuristics that use global knowledge, such as profile similarity among users or the expertise value of an agent. Because Question Waves is compatible with the social search protocol Asknext, it is possible to stop a search when enough relevant answers have been found; additionally, stopping the search early only introduces a minimal risk of not obtaining the best possible answer. Furthermore, Question Waves does not require a re-ranking algorithm because the results arrive sorted
Resumo:
Two speed management policies were implemented in the metropolitan area of Barcelona aimed at reducing air pollution concentration levels. In 2008, the maximum speed limit was reduced to 80 km/h and, in 2009, a variable speed system was introduced on some metropolitan motorways. This paper evaluates whether such policies have been successful in promoting cleaner air, not only in terms of mean pollutant levels but also during high and low pollution episodes. We use a quantile regression approach for fixed effect panel data. We find that the variable speed system improves air quality with regard to the two pollutants considered here, being most effective when nitrogen oxide levels are not too low and when particulate matter concentrations are below extremely high levels. However, reducing the maximum speed limit from 120/100 km/h to 80 km/h has no effect – or even a slightly increasing effect –on the two pollutants, depending on the pollution scenario. Length: 32 pages
Resumo:
El problema de la regresión simbólica consiste en el aprendizaje, a partir de un conjunto muestra de datos obtenidos experimentalmente, de una función desconocida. Los métodos evolutivos han demostrado su eficiencia en la resolución de instancias de dicho problema. En este proyecto se propone una nueva estrategia evolutiva, a través de algoritmos genéticos, basada en una nueva estructura de datos denominada Straight Line Program (SLP) y que representa en este caso expresiones simbólicas. A partir de un SLP universal, que depende de una serie de parámetros cuya especialización proporciona SLP's concretos del espacio de búsqueda, la estrategia trata de encontrar los parámetros óptimos para que el SLP universal represente la función que mejor se aproxime al conjunto de puntos muestra. De manera conceptual, este proyecto consiste en un entrenamiento genético del SLP universal, utilizando los puntos muestra como conjunto de entrenamiento, para resolver el problema de la regresión simbólica.
Resumo:
This paper proposes a pose-based algorithm to solve the full SLAM problem for an autonomous underwater vehicle (AUV), navigating in an unknown and possibly unstructured environment. The technique incorporate probabilistic scan matching with range scans gathered from a mechanical scanning imaging sonar (MSIS) and the robot dead-reckoning displacements estimated from a Doppler velocity log (DVL) and a motion reference unit (MRU). The proposed method utilizes two extended Kalman filters (EKF). The first, estimates the local path travelled by the robot while grabbing the scan as well as its uncertainty and provides position estimates for correcting the distortions that the vehicle motion produces in the acoustic images. The second is an augment state EKF that estimates and keeps the registered scans poses. The raw data from the sensors are processed and fused in-line. No priory structural information or initial pose are considered. The algorithm has been tested on an AUV guided along a 600 m path within a marina environment, showing the viability of the proposed approach
Resumo:
Image segmentation of natural scenes constitutes a major problem in machine vision. This paper presents a new proposal for the image segmentation problem which has been based on the integration of edge and region information. This approach begins by detecting the main contours of the scene which are later used to guide a concurrent set of growing processes. A previous analysis of the seed pixels permits adjustment of the homogeneity criterion to the region's characteristics during the growing process. Since the high variability of regions representing outdoor scenes makes the classical homogeneity criteria useless, a new homogeneity criterion based on clustering analysis and convex hull construction is proposed. Experimental results have proven the reliability of the proposed approach
Resumo:
Genetic algorithm was used for variable selection in simultaneous determination of mixtures of glucose, maltose and fructose by mid infrared spectroscopy. Different models, using partial least squares (PLS) and multiple linear regression (MLR) with and without data pre-processing, were used. Based on the results obtained, it was verified that a simpler model (multiple linear regression with variable selection by genetic algorithm) produces results comparable to more complex methods (partial least squares). The relative errors obtained for the best model was around 3% for the sugar determination, which is acceptable for this kind of determination.
Resumo:
Learning of preference relations has recently received significant attention in machine learning community. It is closely related to the classification and regression analysis and can be reduced to these tasks. However, preference learning involves prediction of ordering of the data points rather than prediction of a single numerical value as in case of regression or a class label as in case of classification. Therefore, studying preference relations within a separate framework facilitates not only better theoretical understanding of the problem, but also motivates development of the efficient algorithms for the task. Preference learning has many applications in domains such as information retrieval, bioinformatics, natural language processing, etc. For example, algorithms that learn to rank are frequently used in search engines for ordering documents retrieved by the query. Preference learning methods have been also applied to collaborative filtering problems for predicting individual customer choices from the vast amount of user generated feedback. In this thesis we propose several algorithms for learning preference relations. These algorithms stem from well founded and robust class of regularized least-squares methods and have many attractive computational properties. In order to improve the performance of our methods, we introduce several non-linear kernel functions. Thus, contribution of this thesis is twofold: kernel functions for structured data that are used to take advantage of various non-vectorial data representations and the preference learning algorithms that are suitable for different tasks, namely efficient learning of preference relations, learning with large amount of training data, and semi-supervised preference learning. Proposed kernel-based algorithms and kernels are applied to the parse ranking task in natural language processing, document ranking in information retrieval, and remote homology detection in bioinformatics domain. Training of kernel-based ranking algorithms can be infeasible when the size of the training set is large. This problem is addressed by proposing a preference learning algorithm whose computation complexity scales linearly with the number of training data points. We also introduce sparse approximation of the algorithm that can be efficiently trained with large amount of data. For situations when small amount of labeled data but a large amount of unlabeled data is available, we propose a co-regularized preference learning algorithm. To conclude, the methods presented in this thesis address not only the problem of the efficient training of the algorithms but also fast regularization parameter selection, multiple output prediction, and cross-validation. Furthermore, proposed algorithms lead to notably better performance in many preference learning tasks considered.
Resumo:
We adapt the Shout and Act algorithm to Digital Objects Preservation where agents explore file systems looking for digital objects to be preserved (victims). When they find something they “shout” so that agent mates can hear it. The louder the shout, the urgent or most important the finding is. Louder shouts can also refer to closeness. We perform several experiments to show that this system works very scalably, showing that heterogeneous teams of agents outperform homogeneous ones over a wide range of tasks complexity. The target at-risk documents are MS Office documents (including an RTF file) with Excel content or in Excel format. Thus, an interesting conclusion from the experiments is that fewer heterogeneous (varying skills) agents can equal the performance of many homogeneous (combined super-skilled) agents, implying significant performance increases with lower overall cost growth. Our results impact the design of Digital Objects Preservation teams: a properly designed combination of heterogeneous teams is cheaper and more scalable when confronted with uncertain maps of digital objects that need to be preserved. A cost pyramid is proposed for engineers to use for modeling the most effective agent combinations