8 resultados para High-dimensional data visualization

em Repositório Científico do Instituto Politécnico de Lisboa - Portugal


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: With the decrease of DNA sequencing costs, sequence-based typing methods are rapidly becoming the gold standard for epidemiological surveillance. These methods provide reproducible and comparable results needed for a global scale bacterial population analysis, while retaining their usefulness for local epidemiological surveys. Online databases that collect the generated allelic profiles and associated epidemiological data are available but this wealth of data remains underused and are frequently poorly annotated since no user-friendly tool exists to analyze and explore it. Results: PHYLOViZ is platform independent Java software that allows the integrated analysis of sequence-based typing methods, including SNP data generated from whole genome sequence approaches, and associated epidemiological data. goeBURST and its Minimum Spanning Tree expansion are used for visualizing the possible evolutionary relationships between isolates. The results can be displayed as an annotated graph overlaying the query results of any other epidemiological data available. Conclusions: PHYLOViZ is a user-friendly software that allows the combined analysis of multiple data sources for microbial epidemiological and population studies. It is freely available at http://www.phyloviz.net.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The knowledge of the anisotropic properties beneath the Iberian Peninsula and Northern Morocco has been dramatically improved since late 2007 with the analysis of the data provided by the dense TopoIberia broadband seismic network, the increasing number of permanent stations operating in Morocco, Portugal and Spain, and the contribution of smaller scale/higher resolution experiments. Results from the two first TopoIberia deployments have evidenced a spectacular rotation of the fast polarization direction (FPD) along the Gibraltar Arc, interpreted as an evidence of mantle flow deflected around the high velocity slab beneath the Alboran Sea, and a rather uniform N100 degrees E FPD beneath the central Iberian Variscan Massif, consistent with global mantle flow models taking into account contributions of surface plate motion, density variations and net lithosphere rotation. The results from the last Iberarray deployment presented here, covering the northern part of the Iberian Peninsula, also show a rather uniform FPD orientation close to N100 degrees E, thus confirming the previous interpretation globally relating the anisotropic parameters to the LPO of mantle minerals generated by mantle flow at asthenospheric depths. However, the degree of anisotropy varies significantly, from delay time values of around 0.5 s beneath NW Iberia to values reaching 2.0 sin its NE comer. The anisotropic parameters retrieved from single events providing high quality data also show significant differences for stations located in the Variscan units of NW Iberia, suggesting that the region includes multiple anisotropic layers or complex anisotropy systems. These results allow to complete the map of the anisotropic properties of the westernmost Mediterranean region, which can now be considered as one of best constrained regions worldwide, with more than 300 sites investigated over an area extending from the Bay of Biscay to the Sahara platform. (C) 2015 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Mainland Portugal, on the southwestern edge of the European continent, is located directly north of the boundary between the Eurasian and Nubian plates. It lies in a region of slow lithospheric deformation (< 5 mm yr(-1)), which has generated some of the largest earthquakes in Europe, both intraplate (mainland) and interplate (offshore). Some offshore earthquakes are nucleated on old and cold lithospheric mantle, at depths down to 60 km. The seismicity of mainland Portugal and its adjacent offshore has been repeatedly classified as diffuse. In this paper, we analyse the instrumental earthquake catalogue for western Iberia, which covers the period between 1961 and 2013. Between 2010 and 2012, the catalogue was enriched with data from dense broad-band deployments. We show that although the plate boundary south of Portugal is diffuse, in that deformation is accommodated along several distributed faults rather than along one long linear plate boundary, the seismicity itself is not diffuse. Rather, when located using high-quality data, earthquakes collapse into well-defined clusters and lineations. We identify and characterize the most outstanding clusters and lineations of epicentres and correlate them with geophysical and tectonic features (historical seismicity, topography, geologically mapped faults, Moho depth, free-air gravity, magnetic anomalies and geotectonic units). Both onshore and offshore, clusters and lineations of earthquakes are aligned preferentially NNE-SSW and WNW-ESE. Cumulative seismic moment and epicentre density decrease from south to north, with increasing distance from the plate boundary. Only few earthquake lineations coincide with geologically mapped faults. Clusters and lineations that do not match geologically mapped faults may correspond to previously unmapped faults (e.g. blind faults), rheological boundaries or distributed fracturing inside blocks that are more brittle and therefore break more easily than neighbour blocks. The seismicity map of western Iberia presented in this article opens important questions concerning the regional seismotectonics. This work shows that the study of low-magnitude earthquakes using dense seismic deployments is a powerful tool to study lithospheric deformation in slowly deforming regions, such as western Iberia, where high-magnitude earthquakes occur with long recurrence intervals.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

We propose a 3-D gravity model for the volcanic structure of the island of Maio (Cape Verde archipelago) with the objective of solving some open questions concerning the geometry and depth of the intrusive Central Igneous Complex. A gravity survey was made covering almost the entire surface of the island. The gravity data was inverted through a non-linear 3-D approach which provided a model constructed in a random growth process. The residual Bouguer gravity field shows a single positive anomaly presenting an elliptic shape with a NWSE trending long axis. This Bouguer gravity anomaly is slightly off-centred with the island but its outline is concordant with the surface exposure of the Central Igneous Complex. The gravimetric modelling shows a high-density volume whose centre of mass is about 4500 m deep. With increasing depth, and despite the restricted gravimetric resolution, the horizontal sections of the model suggest the presence of two distinct bodies, whose relative position accounts for the elongated shape of the high positive Bouguer gravity anomaly. These bodies are interpreted as magma chambers whose coeval volcanic counterparts are no longer preserved. The orientation defined by the two bodies is similar to that of other structures known in the southern group of the Cape Verde islands, thus suggesting a possible structural control constraining the location of the plutonic intrusions.