9 resultados para High-dimensional indexing
em Repositório Científico do Instituto Politécnico de Lisboa - Portugal
Resumo:
Feature selection is a central problem in machine learning and pattern recognition. On large datasets (in terms of dimension and/or number of instances), using search-based or wrapper techniques can be cornputationally prohibitive. Moreover, many filter methods based on relevance/redundancy assessment also take a prohibitively long time on high-dimensional. datasets. In this paper, we propose efficient unsupervised and supervised feature selection/ranking filters for high-dimensional datasets. These methods use low-complexity relevance and redundancy criteria, applicable to supervised, semi-supervised, and unsupervised learning, being able to act as pre-processors for computationally intensive methods to focus their attention on smaller subsets of promising features. The experimental results, with up to 10(5) features, show the time efficiency of our methods, with lower generalization error than state-of-the-art techniques, while being dramatically simpler and faster.
Resumo:
Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.
Resumo:
Feature discretization (FD) techniques often yield adequate and compact representations of the data, suitable for machine learning and pattern recognition problems. These representations usually decrease the training time, yielding higher classification accuracy while allowing for humans to better understand and visualize the data, as compared to the use of the original features. This paper proposes two new FD techniques. The first one is based on the well-known Linde-Buzo-Gray quantization algorithm, coupled with a relevance criterion, being able perform unsupervised, supervised, or semi-supervised discretization. The second technique works in supervised mode, being based on the maximization of the mutual information between each discrete feature and the class label. Our experimental results on standard benchmark datasets show that these techniques scale up to high-dimensional data, attaining in many cases better accuracy than existing unsupervised and supervised FD approaches, while using fewer discretization intervals.
Resumo:
In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio.
Resumo:
Microcrystalline silicon is a two-phase material. Its composition can be interpreted as a series of grains of crystalline silicon imbedded in an amorphous silicon tissue, with a high concentration of dangling bonds in the transition regions. In this paper, results for the transport properties of a mu c-Si:H p-i-n junction obtained by means of two-dimensional numerical simulation are reported. The role played by the boundary regions between the crystalline grains and the amorphous matrix is taken into account and these regions are treated similar to a heterojunction interface. The device is analysed under AM1.5 illumination and the paper outlines the influence of the local electric field at the grain boundary transition regions on the internal electric configuration of the device and on the transport mechanism within the mu c-Si:H intrinsic layer.
Resumo:
A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e. g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 x 4,320 at 30 fps) in real time.
Resumo:
Chitosan biocompatibility and biodegradability properties make this biopolymer promising for the development of advanced internal fixation devices for orthopedic applications. This work presents a detailed study on the production and characterization of three dimensional (3D) dense, non-porous, chitosan-based structures, with the ability to be processed in different shapes, and also with high strength and stiffness. Such features are crucial for the application of such 3D structures as bioabsorbable implantable devices. The influence of chitosan's molecular weight and the addition of one plasticizer (glycerol) on 3D dense chitosan-based products' biomechanical properties were explored. Several specimens were produced and in vitro studies were performed in order to assess the cytotoxicity of these specimens and their physical behavior throughout the enzymatic degradation experiments. The results point out that glycerol does not impact on cytotoxicity and has a high impact in improving mechanical properties, both elasticity and compressive strength. In addition, human mesenchymal stem/stromal cells (MSC) were used as an ex-vivo model to study cell adhesion and proliferation on these structures, showing promising results with fold increase values in total cell number similar to the ones obtained in standard cell culture flasks. (C) 2014 Elsevier Ltd. All rights reserved.
Resumo:
We propose a 3-D gravity model for the volcanic structure of the island of Maio (Cape Verde archipelago) with the objective of solving some open questions concerning the geometry and depth of the intrusive Central Igneous Complex. A gravity survey was made covering almost the entire surface of the island. The gravity data was inverted through a non-linear 3-D approach which provided a model constructed in a random growth process. The residual Bouguer gravity field shows a single positive anomaly presenting an elliptic shape with a NWSE trending long axis. This Bouguer gravity anomaly is slightly off-centred with the island but its outline is concordant with the surface exposure of the Central Igneous Complex. The gravimetric modelling shows a high-density volume whose centre of mass is about 4500 m deep. With increasing depth, and despite the restricted gravimetric resolution, the horizontal sections of the model suggest the presence of two distinct bodies, whose relative position accounts for the elongated shape of the high positive Bouguer gravity anomaly. These bodies are interpreted as magma chambers whose coeval volcanic counterparts are no longer preserved. The orientation defined by the two bodies is similar to that of other structures known in the southern group of the Cape Verde islands, thus suggesting a possible structural control constraining the location of the plutonic intrusions.
Resumo:
The design of magnetic cores can be carried out by taking into account the optimization of different parameters in accordance with the application requirements. Considering the specifications of the fast field cycling nuclear magnetic resonance (FFC-NMR) technique, the magnetic flux density distribution, at the sample insertion volume, is one of the core parameters that needs to be evaluated. Recently, it has been shown that the FFC-NMR magnets can be built on the basis of solenoid coils with ferromagnetic cores. Since this type of apparatus requires magnets with high magnetic flux density uniformity, a new type of magnet using a ferromagnetic core, copper coils, and superconducting blocks was designed with improved magnetic flux density distribution. In this paper, the designing aspects of the magnet are described and discussed with emphasis on the improvement of the magnetic flux density homogeneity (Delta B/B-0) in the air gap. The magnetic flux density distribution is analyzed based on 3-D simulations and NMR experimental results.