995 resultados para Feature types


Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper presents a novel approach to the automatic classification of very large data sets composed of terahertz pulse transient signals, highlighting their potential use in biochemical, biomedical, pharmaceutical and security applications. Two different types of THz spectra are considered in the classification process. Firstly a binary classification study of poly-A and poly-C ribonucleic acid samples is performed. This is then contrasted with a difficult multi-class classification problem of spectra from six different powder samples that although have fairly indistinguishable features in the optical spectrum, they also possess a few discernable spectral features in the terahertz part of the spectrum. Classification is performed using a complex-valued extreme learning machine algorithm that takes into account features in both the amplitude as well as the phase of the recorded spectra. Classification speed and accuracy are contrasted with that achieved using a support vector machine classifier. The study systematically compares the classifier performance achieved after adopting different Gaussian kernels when separating amplitude and phase signatures. The two signatures are presented as feature vectors for both training and testing purposes. The study confirms the utility of complex-valued extreme learning machine algorithms for classification of the very large data sets generated with current terahertz imaging spectrometers. The classifier can take into consideration heterogeneous layers within an object as would be required within a tomographic setting and is sufficiently robust to detect patterns hidden inside noisy terahertz data sets. The proposed study opens up the opportunity for the establishment of complex-valued extreme learning machine algorithms as new chemometric tools that will assist the wider proliferation of terahertz sensing technology for chemical sensing, quality control, security screening and clinic diagnosis. Furthermore, the proposed algorithm should also be very useful in other applications requiring the classification of very large datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Though sound symbolic words (onomatopoeia and mimetic words, or giongo and gitaigo in Japanese) exist in other languages, it would not be so easy to compare them to those in Japanese. This is because unlike in Japanese, in many other languages (here we see English and Spanish) sound symbolic words do not have distinctive forms that separate them immediately from the rest of categories of words. In Japanese, a sound symbolic word has a radical (that is based on the elaborated Japanese sound symbolic system), and often a suffix that shows subtle nuance. Together they give the word a distinctive form that differentiates it from other categories of words, though its grammatical functions could vary, especially in the case of mimetic words (gitaigo). Without such an obvious feature, in other languages, it would not be always easy to separate sound symbolic words from the rest. These expressions are extremely common and used in almost all types of text in Japanese, but their elaborated sound symbolic system and possibly their various grammatical functions are making giongo and gitaigo one of the most difficult challenges for the foreign students and translators. Studying the translation of these expressions into other languages might give some indication related to the comparison of Japanese sound symbolic words and those in other languages. Though sound symbolic words are present in many types of texts in Japanese, their functions in traditional forms of text (letters only) and manga (Japanese comics)are different and they should be treated separately. For example, in traditional types of text such as novels, the vast majority of the sound symbolic words used are mimetic words (gitaigo) and most of them are used as adverbs, whereas in manga, the majority of the sound symbolic words used (excluding those appear within the speech bubbles) are onomatopoeias (giongo) and often used on their own (i.e. not as a part of a sentence). Naturally, the techniques used to translate these expressions in the above two types of documents differ greatly. The presentation will focus on i) grammatical functions of Japanese sound symbolic words in traditional types of texts (novels/poems) and in manga works, and ii) whether their features and functions are maintained (i.e. whether they are translated as sound symbolic words) when translated into other languages (English and Spanish). The latter point should be related to a comparison of sound symbolic words in Japanese and other languages, which will be also discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper investigates the application of neural networks to the recognition of lubrication defects typical to an industrial cold forging process employed by fastener manufacturers. The accurate recognition of lubrication errors, such as coating not being applied properly or damaged during material handling, is very important to the quality of the final product in fastener manufacture. Lubrication errors lead to increased forging loads and premature tool failure, as well as to increased defect sorting and the re-processing of the coated rod. The lubrication coating provides a barrier between the work material and the die during the drawing operation; moreover it needs be sufficiently robust to remain on the wire during the transfer to the cold forging operation. In the cold forging operation the wire undergoes multi-stage deformation without the application of any additional lubrication. Four types of lubrication errors, typical to production of fasteners, were introduced to a set of sample rods, which were subsequently drawn under laboratory conditions. The drawing force was measured, from which a limited set of features was extracted. The neural network based model learned from these features is able to recognize all types of lubrication errors to a high accuracy. The overall accuracy of the neural network model is around 98% with almost uniform distribution of errors between all four errors and the normal condition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The finding and maintaining of high accuracy foveation points for several types of recognised feature in log polar space such as a line, circular or elliptical arc is considered. Log polar space is preferred over cartesian space as it provides a high resolution and a wide viewing angle; feature invariance in the fovea simplifies foveation; it allows multi-resolution analysis; and rotation and scale are linear translations in log polar space.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Precision edge feature extraction is a very important step in vision, Researchers mainly use step edges to model an edge at subpixel level. In this paper we describe a new technique for two dimensional edge feature extraction to subpixel accuracy using a general edge model. Using six basic edge types to model edges, the edge parameters at subpixel level are extracted by fitting a model to the image signal using least-.squared error fitting technique.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A good intrusion system gives an accurate and efficient classification results. This ability is an essential functionality to build an intrusion detection system. In this paper, we focused on using various training functions with feature selection to achieve high accurate results. The data we used in our experiments are NSL-KDD. However, the training and testing time to build the model is very high. To address this, we proposed feature selection based on information gain, which can detect several attack types with high accurate result and low false rate. Moreover, we executed experiments to category each of the five classes (probe, denial of service (DoS), user to super-user (U2R), and remote to local (R2L), normal). Our proposed outperform other state-of-art methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can potentially assist clinical decision making for accurate medical prognosis.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An important feature of life-cycle models is the presence of uncertainty regarding one’s labor income. Yet this issue, long recognized in different areas, has not received enough attention in the optimal taxation literature. This paper is an attempt to fill this gap. We write a simple 3 period model where agents gradually learn their productivities. In a framework akin to Mirrlees’ (1971) static one, we derive properties of optimal tax schedules and show that: i) if preferences are (weakly) separable, uniform taxation of goods is optimal, ii) if they are (strongly) separable capital income is to rate than others forms of investiment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Abstract Background One goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements. Results A new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology. Conclusion The model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data sets describing the state of the earth's atmosphere are of great importance in the atmospheric sciences. Over the last decades, the quality and sheer amount of the available data increased significantly, resulting in a rising demand for new tools capable of handling and analysing these large, multidimensional sets of atmospheric data. The interdisciplinary work presented in this thesis covers the development and the application of practical software tools and efficient algorithms from the field of computer science, aiming at the goal of enabling atmospheric scientists to analyse and to gain new insights from these large data sets. For this purpose, our tools combine novel techniques with well-established methods from different areas such as scientific visualization and data segmentation. In this thesis, three practical tools are presented. Two of these tools are software systems (Insight and IWAL) for different types of processing and interactive visualization of data, the third tool is an efficient algorithm for data segmentation implemented as part of Insight.Insight is a toolkit for the interactive, three-dimensional visualization and processing of large sets of atmospheric data, originally developed as a testing environment for the novel segmentation algorithm. It provides a dynamic system for combining at runtime data from different sources, a variety of different data processing algorithms, and several visualization techniques. Its modular architecture and flexible scripting support led to additional applications of the software, from which two examples are presented: the usage of Insight as a WMS (web map service) server, and the automatic production of a sequence of images for the visualization of cyclone simulations. The core application of Insight is the provision of the novel segmentation algorithm for the efficient detection and tracking of 3D features in large sets of atmospheric data, as well as for the precise localization of the occurring genesis, lysis, merging and splitting events. Data segmentation usually leads to a significant reduction of the size of the considered data. This enables a practical visualization of the data, statistical analyses of the features and their events, and the manual or automatic detection of interesting situations for subsequent detailed investigation. The concepts of the novel algorithm, its technical realization, and several extensions for avoiding under- and over-segmentation are discussed. As example applications, this thesis covers the setup and the results of the segmentation of upper-tropospheric jet streams and cyclones as full 3D objects. Finally, IWAL is presented, which is a web application for providing an easy interactive access to meteorological data visualizations, primarily aimed at students. As a web application, the needs to retrieve all input data sets and to install and handle complex visualization tools on a local machine are avoided. The main challenge in the provision of customizable visualizations to large numbers of simultaneous users was to find an acceptable trade-off between the available visualization options and the performance of the application. Besides the implementational details, benchmarks and the results of a user survey are presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A study of the polarimetric backscattering response of newly formed sea ice types under a large assortment of surface coverage was conducted using a ship-based C-band polarimetric radar system. Polarimetric backscattering results and physical data for 40 stations during the fall freeze-up of 2003, 2006, and 2007 are presented. Analysis of the copolarized correlation coefficient showed its sensitivity to both sea ice thickness and surface coverage and resulted in a statistically significant separation of ice thickness into two regimes: ice less than 6 cm thick and ice greater than 8 cm thick. A case study quantified the backscatter of a layer of snow infiltrated frost flowers on new sea ice, showing that the presence of the old frost flowers can enhance the backscatter by more than 6 dB. Finally, a statistical analysis of a series of temporal-spatial measurements over a visually homogeneous frost-flower-covered ice floe identified temperature as a significant, but not exclusive, factor in the backscattering measurements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We studied the impact of the last glacial (late Weichselian) sea level cycle on sediment architecture in the inner Kara Sea using high-resolution acoustic sub-bottom profiling. The acoustic lines were ground-truthed with dated sediment cores. Furthermore we refined the location of the eastern LGM ice margin, by new sub bottom profiles. New model results of post-Last Glacial Maximum (LGM) isostatic rebound for this area allow a well-constrained interpretation of acoustic units in terms of sequence stratigraphy. The lowstand (or regressive) system tract sediments are absent but are represented by an unconformity atop of Pleistocene sediments on the shelf and by a major incised dendritic paleo-river network. The subsequent transgressive and highstand system tracts are best preserved in the incised channels and the recent estuaries while only minor sediment accumulation on the adjacent shelf areas is documented. The Kara Sea can be subdivided into three areas: estuaries (A), the shelf (B) and (C) deeper lying areas that accumulated a total of 114 * 10**10 t of Holocene sediments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, a novel and approach for obtaining 3D models from video sequences captured with hand-held cameras is addressed. We define a pipeline that robustly deals with different types of sequences and acquiring devices. Our system follows a divide and conquer approach: after a frame decimation that pre-conditions the input sequence, the video is split into short-length clips. This allows to parallelize the reconstruction step which translates into a reduction in the amount of computational resources required. The short length of the clips allows an intensive search for the best solution at each step of reconstruction which robustifies the system. The process of feature tracking is embedded within the reconstruction loop for each clip as opposed to other approaches. A final registration step, merges all the processed clips to the same coordinate frame

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This research proposes a generic methodology for dimensionality reduction upon time-frequency representations applied to the classification of different types of biosignals. The methodology directly deals with the highly redundant and irrelevant data contained in these representations, combining a first stage of irrelevant data removal by variable selection, with a second stage of redundancy reduction using methods based on linear transformations. The study addresses two techniques that provided a similar performance: the first one is based on the selection of a set of the most relevant time?frequency points, whereas the second one selects the most relevant frequency bands. The first methodology needs a lower quantity of components, leading to a lower feature space; but the second improves the capture of the time-varying dynamics of the signal, and therefore provides a more stable performance. In order to evaluate the generalization capabilities of the methodology proposed it has been applied to two types of biosignals with different kinds of non-stationary behaviors: electroencephalographic and phonocardiographic biosignals. Even when these two databases contain samples with different degrees of complexity and a wide variety of characterizing patterns, the results demonstrate a good accuracy for the detection of pathologies, over 98%.The results open the possibility to extrapolate the methodology to the study of other biosignals.