809 resultados para Machine learning classification
Resumo:
Species distribution models (SDMs) are widely used to explain and predict species ranges and environmental niches. They are most commonly constructed by inferring species' occurrence-environment relationships using statistical and machine-learning methods. The variety of methods that can be used to construct SDMs (e.g. generalized linear/additive models, tree-based models, maximum entropy, etc.), and the variety of ways that such models can be implemented, permits substantial flexibility in SDM complexity. Building models with an appropriate amount of complexity for the study objectives is critical for robust inference. We characterize complexity as the shape of the inferred occurrence-environment relationships and the number of parameters used to describe them, and search for insights into whether additional complexity is informative or superfluous. By building 'under fit' models, having insufficient flexibility to describe observed occurrence-environment relationships, we risk misunderstanding the factors shaping species distributions. By building 'over fit' models, with excessive flexibility, we risk inadvertently ascribing pattern to noise or building opaque models. However, model selection can be challenging, especially when comparing models constructed under different modeling approaches. Here we argue for a more pragmatic approach: researchers should constrain the complexity of their models based on study objective, attributes of the data, and an understanding of how these interact with the underlying biological processes. We discuss guidelines for balancing under fitting with over fitting and consequently how complexity affects decisions made during model building. Although some generalities are possible, our discussion reflects differences in opinions that favor simpler versus more complex models. We conclude that combining insights from both simple and complex SDM building approaches best advances our knowledge of current and future species ranges.
Resumo:
Neuroimaging techniques provide valuable tools for diagnosing Alzheimer's disease (AD), monitoring disease progression and evaluating responses to treatment. There is currently a wide array of techniques available including computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and, for recording electrical brain activity, electroencephalography (EEG). The choice of technique depends on the contrast between tissues of interest, spatial resolution, temporal resolution, requirements for functional data and the probable number of scans required. For example, while PET, CT and MRI can be used to differentiate between AD and other dementias, MRI is safer and provides better contrast of soft tissues. Neuroimaging is a technique spanning many disciplines and requires effective communication between doctors requesting a scan of a patient or group of patients and those with technical expertise. Consideration and discussion of the most suitable type of scan and the necessary settings to achieve the best results will help ensure appropriate techniques are chosen and used effectively. Neuroimaging techniques are currently expanding understanding of the structural and functional changes that occur in dementia. Further research may allow identification of early neurological signs ofAD, before clinical symptoms are evident, providing the opportunity to test preventative therapies. CombiningMRI and machine learning techniques may be a powerful approach to improve diagnosis ofAD and to predict clinical outcomes.
Resumo:
Projecte de recerca elaborat a partir d’una estada a la National University of Singapore Singapur, entre juliol i octubre del 2007. Donada l'explosió de la música a l'internet i la ràpida expansió de les col•leccions de música digital, un repte clau en l'àrea de la informació musical és el desenvolupament de sistemes de processament musical eficients i confiables. L'objectiu de la investigació proposada ha estat treballar en diferents aspectes de l'extracció, modelatge i processat del contingut musical. En particular, s’ha treballat en l'extracció, l'anàlisi i la manipulació de descriptors d'àudio de baix nivell, el modelatge de processos musicals, l'estudi i desenvolupament de tècniques d'aprenentatge automàtic per a processar àudio, i la identificació i extracció d'atributs musicals d'alt nivell. S’han revisat i millorat alguns components d'anàlisis d'àudio i revisat components per a l'extracció de descriptors inter-nota i intra-nota en enregistraments monofónics d'àudio. S’ha aplicat treball previ en Tempo a la formalització de diferents tasques musicals. Finalment, s’ha investigat el processat d'alt nivell de música basandonos en el seu contingut. Com exemple d'això, s’ha investigat com músics professionals expressen i comuniquen la seva interpretació del contingut musical i emocional de peces musicals, i hem usat aquesta informació per a identificar automàticament intèrprets. S’han estudiat les desviacions en paràmetres com to, temps, amplitud i timbre a nivell inter-nota i intra-nota.
Mejora diagnóstica de hepatopatías de afectación difusa mediante técnicas de inteligencia artificial
Resumo:
The automatic diagnostic discrimination is an application of artificial intelligence techniques that can solve clinical cases based on imaging. Diffuse liver diseases are diseases of wide prominence in the population and insidious course, yet early in its progression. Early and effective diagnosis is necessary because many of these diseases progress to cirrhosis and liver cancer. The usual technique of choice for accurate diagnosis is liver biopsy, an invasive and not without incompatibilities one. It is proposed in this project an alternative non-invasive and free of contraindications method based on liver ultrasonography. The images are digitized and then analyzed using statistical techniques and analysis of texture. The results are validated from the pathology report. Finally, we apply artificial intelligence techniques as Fuzzy k-Means or Support Vector Machines and compare its significance to the analysis Statistics and the report of the clinician. The results show that this technique is significantly valid and a promising alternative as a noninvasive diagnostic chronic liver disease from diffuse involvement. Artificial Intelligence classifying techniques significantly improve the diagnosing discrimination compared to other statistics.
Resumo:
En aquest projecte es presenta l’aplicació per a dispositius mòbils Doppelganger. La seva funció és, a partir d’una fotografia, detectar la cara i mostrar la persona famosa de la nostra base de dades que més s’assembla a la persona en la fotografia. Per la implementació s’han utilitzat algoritmes de visió per computador i d’aprenentatge automàtic com per exemple el PCA i el K-Nearest Neighbor, tot utilitzant llibreries gratuïtes com són les OpenCV.
Resumo:
Among various advantages, their small size makes model organisms preferred subjects of investigation. Yet, even in model systems detailed analysis of numerous developmental processes at cellular level is severely hampered by their scale. For instance, secondary growth of Arabidopsis hypocotyls creates a radial pattern of highly specialized tissues that comprises several thousand cells starting from a few dozen. This dynamic process is difficult to follow because of its scale and because it can only be investigated invasively, precluding comprehensive understanding of the cell proliferation, differentiation, and patterning events involved. To overcome such limitation, we established an automated quantitative histology approach. We acquired hypocotyl cross-sections from tiled high-resolution images and extracted their information content using custom high-throughput image processing and segmentation. Coupled with automated cell type recognition through machine learning, we could establish a cellular resolution atlas that reveals vascular morphodynamics during secondary growth, for example equidistant phloem pole formation. DOI: http://dx.doi.org/10.7554/eLife.01567.001.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. Recent advances in machine learning offer a novel approach to model spatial distribution of petrophysical properties in complex reservoirs alternative to geostatistics. The approach is based of semisupervised learning, which handles both ?labelled? observed data and ?unlabelled? data, which have no measured value but describe prior knowledge and other relevant data in forms of manifolds in the input space where the modelled property is continuous. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic geological features and describe stochastic variability and non-uniqueness of spatial properties. On the other hand, it is able to capture and preserve key spatial dependencies such as connectivity of high permeability geo-bodies, which is often difficult in contemporary petroleum reservoir studies. Semi-supervised SVR as a data driven algorithm is designed to integrate various kind of conditioning information and learn dependences from it. The semi-supervised SVR model is able to balance signal/noise levels and control the prior belief in available data. In this work, stochastic semi-supervised SVR geomodel is integrated into Bayesian framework to quantify uncertainty of reservoir production with multiple models fitted to past dynamic observations (production history). Multiple history matched models are obtained using stochastic sampling and/or MCMC-based inference algorithms, which evaluate posterior probability distribution. Uncertainty of the model is described by posterior probability of the model parameters that represent key geological properties: spatial correlation size, continuity strength, smoothness/variability of spatial property distribution. The developed approach is illustrated with a fluvial reservoir case. The resulting probabilistic production forecasts are described by uncertainty envelopes. The paper compares the performance of the models with different combinations of unknown parameters and discusses sensitivity issues.
Resumo:
Emotions are crucial for user's decision making in recommendation processes. We first introduce ambient recommender systems, which arise from the analysis of new trends on the exploitation of the emotional context in the next generation of recommender systems. We then explain some results of these new trends in real-world applications through the smart prediction assistant (SPA) platform in an intelligent learning guide with more than three million users. While most approaches to recommending have focused on algorithm performance. SPA makes recommendations to users on the basis of emotional information acquired in an incremental way. This article provides a cross-disciplinary perspective to achieve this goal in such recommender systems through a SPA platform. The methodology applied in SPA is the result of a bunch of technology transfer projects for large real-world rccommender systems
Resumo:
Nowadays, the joint exploitation of images acquired daily by remote sensing instruments and of images available from archives allows a detailed monitoring of the transitions occurring at the surface of the Earth. These modifications of the land cover generate spectral discrepancies that can be detected via the analysis of remote sensing images. Independently from the origin of the images and of type of surface change, a correct processing of such data implies the adoption of flexible, robust and possibly nonlinear method, to correctly account for the complex statistical relationships characterizing the pixels of the images. This Thesis deals with the development and the application of advanced statistical methods for multi-temporal optical remote sensing image processing tasks. Three different families of machine learning models have been explored and fundamental solutions for change detection problems are provided. In the first part, change detection with user supervision has been considered. In a first application, a nonlinear classifier has been applied with the intent of precisely delineating flooded regions from a pair of images. In a second case study, the spatial context of each pixel has been injected into another nonlinear classifier to obtain a precise mapping of new urban structures. In both cases, the user provides the classifier with examples of what he believes has changed or not. In the second part, a completely automatic and unsupervised method for precise binary detection of changes has been proposed. The technique allows a very accurate mapping without any user intervention, resulting particularly useful when readiness and reaction times of the system are a crucial constraint. In the third, the problem of statistical distributions shifting between acquisitions is studied. Two approaches to transform the couple of bi-temporal images and reduce their differences unrelated to changes in land cover are studied. The methods align the distributions of the images, so that the pixel-wise comparison could be carried out with higher accuracy. Furthermore, the second method can deal with images from different sensors, no matter the dimensionality of the data nor the spectral information content. This opens the doors to possible solutions for a crucial problem in the field: detecting changes when the images have been acquired by two different sensors.
Resumo:
Student guidance is an always desired characteristic in any educational system, butit represents special difficulty if it has to be deployed in an automated way to fulfilsuch needs in a computer supported educational tool. In this paper we explorepossible avenues relying on machine learning techniques, to be included in a nearfuture -in the form of a tutoring navigational tool- in a teleeducation platform -InterMediActor- currently under development. Since no data from that platform isavailable yet, the preliminary experiments presented in this paper are builtinterpreting every subject in the Telecommunications Degree at Universidad CarlosIII de Madrid as an aggregated macro-competence (following the methodologicalconsiderations in InterMediActor), such that marks achieved by students can beused as data for the models, to be replaced in a near future by real data directlymeasured inside InterMediActor. We evaluate the predictability of students qualifications, and we deploy a preventive early detection system -failure alert-, toidentify those students more prone to fail a certain subject such that correctivemeans can be deployed with sufficient anticipation.
Resumo:
The paper proposes an approach aimed at detecting optimal model parameter combinations to achieve the most representative description of uncertainty in the model performance. A classification problem is posed to find the regions of good fitting models according to the values of a cost function. Support Vector Machine (SVM) classification in the parameter space is applied to decide if a forward model simulation is to be computed for a particular generated model. SVM is particularly designed to tackle classification problems in high-dimensional space in a non-parametric and non-linear way. SVM decision boundaries determine the regions that are subject to the largest uncertainty in the cost function classification, and, therefore, provide guidelines for further iterative exploration of the model space. The proposed approach is illustrated by a synthetic example of fluid flow through porous media, which features highly variable response due to the parameter values' combination.
Resumo:
This paper presents a review of methodology for semi-supervised modeling with kernel methods, when the manifold assumption is guaranteed to be satisfied. It concerns environmental data modeling on natural manifolds, such as complex topographies of the mountainous regions, where environmental processes are highly influenced by the relief. These relations, possibly regionalized and nonlinear, can be modeled from data with machine learning using the digital elevation models in semi-supervised kernel methods. The range of the tools and methodological issues discussed in the study includes feature selection and semisupervised Support Vector algorithms. The real case study devoted to data-driven modeling of meteorological fields illustrates the discussed approach.
Resumo:
Uncertainty quantification of petroleum reservoir models is one of the present challenges, which is usually approached with a wide range of geostatistical tools linked with statistical optimisation or/and inference algorithms. The paper considers a data driven approach in modelling uncertainty in spatial predictions. Proposed semi-supervised Support Vector Regression (SVR) model has demonstrated its capability to represent realistic features and describe stochastic variability and non-uniqueness of spatial properties. It is able to capture and preserve key spatial dependencies such as connectivity, which is often difficult to achieve with two-point geostatistical models. Semi-supervised SVR is designed to integrate various kinds of conditioning data and learn dependences from them. A stochastic semi-supervised SVR model is integrated into a Bayesian framework to quantify uncertainty with multiple models fitted to dynamic observations. The developed approach is illustrated with a reservoir case study. The resulting probabilistic production forecasts are described by uncertainty envelopes.
Resumo:
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.