10 resultados para Feature Extraction
em Brock University, Canada
Resumo:
Remote sensing techniques involving hyperspectral imagery have applications in a number of sciences that study some aspects of the surface of the planet. The analysis of hyperspectral images is complex because of the large amount of information involved and the noise within that data. Investigating images with regard to identify minerals, rocks, vegetation and other materials is an application of hyperspectral remote sensing in the earth sciences. This thesis evaluates the performance of two classification and clustering techniques on hyperspectral images for mineral identification. Support Vector Machines (SVM) and Self-Organizing Maps (SOM) are applied as classification and clustering techniques, respectively. Principal Component Analysis (PCA) is used to prepare the data to be analyzed. The purpose of using PCA is to reduce the amount of data that needs to be processed by identifying the most important components within the data. A well-studied dataset from Cuprite, Nevada and a dataset of more complex data from Baffin Island were used to assess the performance of these techniques. The main goal of this research study is to evaluate the advantage of training a classifier based on a small amount of data compared to an unsupervised method. Determining the effect of feature extraction on the accuracy of the clustering and classification method is another goal of this research. This thesis concludes that using PCA increases the learning accuracy, and especially so in classification. SVM classifies Cuprite data with a high precision and the SOM challenges SVM on datasets with high level of noise (like Baffin Island).
Resumo:
An efficient way of synthesizing the deuterium labelled analogues of three methoxypyrazine compounds: 2-d3-methoxy-3-isopropylpyrazine, 2-d3-methoxy-3- isobutylpyrazine, and 2-d3-methoxy-3-secbutylpyrazine, has been developed. To confirm that the deuterium labels had been incorporated into the expected positions in the molecules synthesized, the relevant characterization by NMR, HRMS and GC/MS analysis was conducted. Another part of this work involved quantitative determination of methoxypyrazines in water and wines. Solid-phase extraction (SPE) proved to be a suitable means for the sample separation and concentration prior to GC/MS analysis.Such factors as the presence of ethanol, salt, and acid have been investigated which can influence the recovery by SPE for the pyrazines from the water matrix. Significantly, in this work comparatively simple fractional distillation was attempted to replace the conventional steam distillation for pre-concentrating a sample with a relatively large volume prior to SPE. Finally, a real wine sample spiked with the relevant isotope-labelled methoxypyrazines was quantitatively analyzed, revealing that the wine with 10 beetles per litre contained 138 ppt of 2-methoxy-3-isopropylpyrazine. Interestingly, we have also found that 2-methoxy-3-secbutylpyrazine exhibits an extremely low detection limit in GC/MS analysis compared with the detection limit of the other two methoxypyrazines: 2- methoxy-3-isopropylpyrazine and 2-methoxy-3-isobutylpyrazine.
Resumo:
Factors affecting the detennination of PAHs by capillary GC/MS were studied. The effect of the initial column temperature and the injection solvent on the peak areas and heights of sixteen PAHs, considered as priority pollutants, USillg crosslinked methyl silicone (DB!) and 5% diphenyl, 94% dimethyl, 1% vinyl polysiloxane (DBS) columns was examined. The possibility of using high boiling point alcohols especially butanol, pentanol, cyclopentanol, and hexanol as injection solvents was investigated. Studies were carried out to optimize the initial column temperature for each of the alcohols. It was found that the optimum initial column temperature is dependent on the solvent employed. The peak areas and heights of the PAHs are enhanced when the initial column temperature is 10-20 c above the boiling point of the solvent using DB5 column, and the same or 10 C above the boiling point of the solvent using DB1 column. Comparing the peak signals of the PAHs using the alcohols, p-xylene, n-octane, and nonane as injection solvents, hexanol gave the greatest peak areas and heights of the PAHs particularly the late-eluted peaks. The detection limits were at low pg levels, ranging from 6.0 pg for fluorene t9 83.6 pg for benzo(a)pyrene. The effect of the initial column temperature on the peak shape and the separation efficiency of the PARs was also studied using DB1 and DB5 columns. Fronting or splitting of the peaks was obseIVed at very low initial column temperature. When high initial column temperature was used, tailing of the peaks appeared. Great difference between DB! and.DB5 columns in the range of the initial column temperature in which symmetrical.peaks of PAHs can be obtained is observed. Wider ranges were shown using DB5 column. Resolution of the closely-eluted PAHs was also affected by the initial column temperature depending on the stationary phase employed. In the case of DB5, only the earlyeluted PAHs were affected; whereas, with DB1, all PAHs were affected. An analytical procedure utilizing solid phase extraction with bonded phase silica (C8) cartridges combined with GC/MS was developed to analyze PAHs in water as an alternative method to those based on the extraction with organic solvent. This simple procedure involved passing a 50 ml of spiked water sample through C8 bonded phase silica cartridges at 10 ml/min, dried by passing a gentle flow of nitrogen at 20 ml/min for 30 sec, and eluting the trapped PAHs with 500 Jll of p-xylene at 0.3 ml/min. The recoveries of PAHs were greater than 80%, with less than 10% relative standard deviations of nine determinations. No major contaminants were present that could interfere with the recognition of PAHs. It was also found that these bonded phase silica cartridges can be re-used for the extraction of PAHs from water.
Resumo:
Order parameter profiles extracted from the NMR spectra of model membranes are a valuable source of information about their structure and molecular motions. To al1alyze powder spectra the de-Pake-ing (numerical deconvolution) ~echnique can be used, but it assumes a random (spherical) dist.ribution of orientations in the sample. Multilamellar vesicles are known to deform and orient in the strong magnetic fields of NMR magnets, producing non-spherical orientation distributions. A recently developed technique for simultaneously extracting the anisotropies of the system as well as the orientation distributions is applied to the analysis of partially magnetically oriented 31p NMR spectra of phospholipids. A mixture of synthetic lipids, POPE and POPG, is analyzed to measure distortion of multilamellar vesicles in a magnetic field. In the analysis three models describing the shape of the distorted vesicles are examined. Ellipsoids of rotation with a semiaxis ratio of about 1.14 are found to provide a good approximation of the shape of the distorted vesicles. This is in reasonable agreement with published experimental work. All three models yield clearly non-spherical orientational distributions, as well as a precise measure of the anisotropy of the chemical shift. Noise in the experimental data prevented the analysis from concluding which of the three models is the best approximation. A discretization scheme for finding stability in the algorithm is outlined
Resumo:
A feature-based fitness function is applied in a genetic programming system to synthesize stochastic gene regulatory network models whose behaviour is defined by a time course of protein expression levels. Typically, when targeting time series data, the fitness function is based on a sum-of-errors involving the values of the fluctuating signal. While this approach is successful in many instances, its performance can deteriorate in the presence of noise. This thesis explores a fitness measure determined from a set of statistical features characterizing the time series' sequence of values, rather than the actual values themselves. Through a series of experiments involving symbolic regression with added noise and gene regulatory network models based on the stochastic 'if-calculus, it is shown to successfully target oscillating and non-oscillating signals. This practical and versatile fitness function offers an alternate approach, worthy of consideration for use in algorithms that evaluate noisy or stochastic behaviour.
Resumo:
Second-rank tensor interactions, such as quadrupolar interactions between the spin- 1 deuterium nuclei and the electric field gradients created by chemical bonds, are affected by rapid random molecular motions that modulate the orientation of the molecule with respect to the external magnetic field. In biological and model membrane systems, where a distribution of dynamically averaged anisotropies (quadrupolar splittings, chemical shift anisotropies, etc.) is present and where, in addition, various parts of the sample may undergo a partial magnetic alignment, the numerical analysis of the resulting Nuclear Magnetic Resonance (NMR) spectra is a mathematically ill-posed problem. However, numerical methods (de-Pakeing, Tikhonov regularization) exist that allow for a simultaneous determination of both the anisotropy and orientational distributions. An additional complication arises when relaxation is taken into account. This work presents a method of obtaining the orientation dependence of the relaxation rates that can be used for the analysis of the molecular motions on a broad range of time scales. An arbitrary set of exponential decay rates is described by a three-term truncated Legendre polynomial expansion in the orientation dependence, as appropriate for a second-rank tensor interaction, and a linear approximation to the individual decay rates is made. Thus a severe numerical instability caused by the presence of noise in the experimental data is avoided. At the same time, enough flexibility in the inversion algorithm is retained to achieve a meaningful mapping from raw experimental data to a set of intermediate, model-free
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and deterministic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel metaheuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS metaheuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
The curse of dimensionality is a major problem in the fields of machine learning, data mining and knowledge discovery. Exhaustive search for the most optimal subset of relevant features from a high dimensional dataset is NP hard. Sub–optimal population based stochastic algorithms such as GP and GA are good choices for searching through large search spaces, and are usually more feasible than exhaustive and determinis- tic search algorithms. On the other hand, population based stochastic algorithms often suffer from premature convergence on mediocre sub–optimal solutions. The Age Layered Population Structure (ALPS) is a novel meta–heuristic for overcoming the problem of premature convergence in evolutionary algorithms, and for improving search in the fitness landscape. The ALPS paradigm uses an age–measure to control breeding and competition between individuals in the population. This thesis uses a modification of the ALPS GP strategy called Feature Selection ALPS (FSALPS) for feature subset selection and classification of varied supervised learning tasks. FSALPS uses a novel frequency count system to rank features in the GP population based on evolved feature frequencies. The ranked features are translated into probabilities, which are used to control evolutionary processes such as terminal–symbol selection for the construction of GP trees/sub-trees. The FSALPS meta–heuristic continuously refines the feature subset selection process whiles simultaneously evolving efficient classifiers through a non–converging evolutionary process that favors selection of features with high discrimination of class labels. We investigated and compared the performance of canonical GP, ALPS and FSALPS on high–dimensional benchmark classification datasets, including a hyperspectral image. Using Tukey’s HSD ANOVA test at a 95% confidence interval, ALPS and FSALPS dominated canonical GP in evolving smaller but efficient trees with less bloat expressions. FSALPS significantly outperformed canonical GP and ALPS and some reported feature selection strategies in related literature on dimensionality reduction.
Resumo:
Digital Terrain Models (DTMs) are important in geology and geomorphology, since elevation data contains a lot of information pertaining to geomorphological processes that influence the topography. The first derivative of topography is attitude; the second is curvature. GIS tools were developed for derivation of strike, dip, curvature and curvature orientation from Digital Elevation Models (DEMs). A method for displaying both strike and dip simultaneously as colour-coded visualization (AVA) was implemented. A plug-in for calculating strike and dip via Least Squares Regression was created first using VB.NET. Further research produced a more computationally efficient solution, convolution filtering, which was implemented as Python scripts. These scripts were also used for calculation of curvature and curvature orientation. The application of these tools was demonstrated by performing morphometric studies on datasets from Earth and Mars. The tools show promise, however more work is needed to explore their full potential and possible uses.
Resumo:
New Feature at Niagara – Clark Hill Islands (5 islands situated in the rapids of the Niagara River). These islands are currently known as Dufferin Islands, 22 ½ cm. x 15 ½ cm, n.d.