19 resultados para two-dimensional principal component analysis (2DPCA)
em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland
Resumo:
This work is devoted to the problem of reconstructing the basis weight structure at paper web with black{box techniques. The data that is analyzed comes from a real paper machine and is collected by an o®-line scanner. The principal mathematical tool used in this work is Autoregressive Moving Average (ARMA) modelling. When coupled with the Discrete Fourier Transform (DFT), it gives a very flexible and interesting tool for analyzing properties of the paper web. Both ARMA and DFT are independently used to represent the given signal in a simplified version of our algorithm, but the final goal is to combine the two together. Ljung-Box Q-statistic lack-of-fit test combined with the Root Mean Squared Error coefficient gives a tool to separate significant signals from noise.
Resumo:
Due to the large number of characteristics, there is a need to extract the most relevant characteristicsfrom the input data, so that the amount of information lost in this way is minimal, and the classification realized with the projected data set is relevant with respect to the original data. In order to achieve this feature extraction, different statistical techniques, as well as the principal components analysis (PCA) may be used. This thesis describes an extension of principal components analysis (PCA) allowing the extraction ofa finite number of relevant features from high-dimensional fuzzy data and noisy data. PCA finds linear combinations of the original measurement variables that describe the significant variation in the data. The comparisonof the two proposed methods was produced by using postoperative patient data. Experiment results demonstrate the ability of using the proposed two methods in complex data. Fuzzy PCA was used in the classificationproblem. The classification was applied by using the similarity classifier algorithm where total similarity measures weights are optimized with differential evolution algorithm. This thesis presents the comparison of the classification results based on the obtained data from the fuzzy PCA.
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
Diplomityössä on käsitelty uudenlaisia menetelmiä riippumattomien komponenttien analyysiin(ICA): Menetelmät perustuvat colligaatioon ja cross-momenttiin. Colligaatio menetelmä perustuu painojen colligaatioon. Menetelmässä on käytetty kahden tyyppisiä todennäköisyysjakaumia yhden sijasta joka perustuu yleiseen itsenäisyyden kriteeriin. Työssä on käytetty colligaatio lähestymistapaa kahdella asymptoottisella esityksellä. Gram-Charlie ja Edgeworth laajennuksia käytetty arvioimaan todennäköisyyksiä näissä menetelmissä. Työssä on myös käytetty cross-momentti menetelmää joka perustuu neljännen asteen cross-momenttiin. Menetelmä on hyvin samankaltainen FastICA algoritmin kanssa. Molempia menetelmiä on tarkasteltu lineaarisella kahden itsenäisen muuttajan sekoituksella. Lähtö signaalit ja sekoitetut matriisit ovattuntemattomia signaali lähteiden määrää lukuunottamatta. Työssä on vertailtu colligaatio menetelmään ja sen modifikaatioita FastICA:an ja JADE:en. Työssä on myös tehty vertailu analyysi suorituskyvyn ja keskusprosessori ajan suhteen cross-momenttiin perustuvien menetelmien, FastICA:n ja JADE:n useiden sekoitettujen parien kanssa.
Resumo:
Raw measurement data does not always immediately convey useful information, but applying mathematical statistical analysis tools into measurement data can improve the situation. Data analysis can offer benefits like acquiring meaningful insight from the dataset, basing critical decisions on the findings, and ruling out human bias through proper statistical treatment. In this thesis we analyze data from an industrial mineral processing plant with the aim of studying the possibility of forecasting the quality of the final product, given by one variable, with a model based on the other variables. For the study mathematical tools like Qlucore Omics Explorer (QOE) and Sparse Bayesian regression (SB) are used. Later on, linear regression is used to build a model based on a subset of variables that seem to have most significant weights in the SB model. The results obtained from QOE show that the variable representing the desired final product does not correlate with other variables. For SB and linear regression, the results show that both SB and linear regression models built on 1-day averaged data seriously underestimate the variance of true data, whereas the two models built on 1-month averaged data are reliable and able to explain a larger proportion of variability in the available data, making them suitable for prediction purposes. However, it is concluded that no single model can fit well the whole available dataset and therefore, it is proposed for future work to make piecewise non linear regression models if the same available dataset is used, or the plant to provide another dataset that should be collected in a more systematic fashion than the present data for further analysis.
Resumo:
The aim of this master's thesis is to develop a two-dimensional drift-di usion model, which describes charge transport in organic solar cells. The main bene t of a two-dimensional model compared to a one-dimensional one is the inclusion of the nanoscale morphology of the active layer of a bulk heterojunction solar cell. The developed model was used to study recombination dynamics at the donor-acceptor interface. In some cases, it was possible to determine e ective parameters, which reproduce the results of the two-dimensional model in the one-dimensional case. A summary of the theory of charge transport in semiconductors was presented and discussed in the context of organic materials. Additionally, the normalization and discretization procedures required to nd a numerical solution to the charge transport problem were outlined. The charge transport problem was solved by implementing an iterative scheme called successive over-relaxation. The obtained solution is given as position-dependent electric potential, free charge carrier concentrations and current densities in the active layer. An interfacial layer, separating the pure phases, was introduced in order to describe charge dynamics occurring at the interface between the donor and acceptor. For simplicity, an e ective generation of free charge carriers in the interfacial layer was implemented. The pure phases simply act as transport layers for the photogenerated charges. Langevin recombination was assumed in the two-dimensional model and an analysis of the apparent recombination rate in the one-dimensional case is presented. The recombination rate in a two-dimensional model is seen to e ectively look like reduced Langevin recombination at open circuit. Replicating the J-U curves obtained in the two-dimensional model is, however, not possible by introducing a constant reduction factor in the Langevin recombination rate. The impact of an acceptor domain in the pure donor phase was investigated. Two cases were considered, one where the acceptor domain is isolated and another where it is connected to the bulk of the acceptor. A comparison to the case where no isolated domains exist was done in order to quantify the observed reduction in the photocurrent. The results show that all charges generated at the isolated domain are lost to recombination, but the domain does not have a major impact on charge transport. Trap-assisted recombination at interfacial trap states was investigated, as well as the surface dipole caused by the trapped charges. A theoretical expression for the ideality factor n_id as a function of generation was derived and shown to agree with simulation data. When the theoretical expression was fitted to simulation data, no interface dipole was observed.
Resumo:
Organisatorisen luottamuksen tutkimuksessa luottamus nähdään yleensä henkilöiden välisenä ilmiönä kuten työntekijän luottamuksena työtovereihin, esimieheen tai lähimpään johtoon. Organisatorisessa luottamuksessa on kuitenkin myös ei-henkilöityvä ulottuvuus, ns. institutionaalinen luottamus. Tähän mennessä vain muutamat tutkijat ovat omissa tutkimuksissaan käyttäneet myös institutionaalista luottamusta osana organisatorista luottamusta. Tämän työn tavoitteena on kehittää institutionaalisen luottamuksen käsitettä sekä mittari sen havainnoimiseksi organisaatioympäristössä. Kehitysprosessi koostui kolmesta vaiheesta. Ensimmäisessä vaiheessa kehitettiin mittariin tulevia väittämiä sekä arvioitiin sisällön validiteetti. Toinen vaihe käsitti aineiston keruun, väittämien karsimisen sekä vaihtoehtoisten mallien vertailun. Kolmannessa vaiheessa arvioitiin rakennevaliditeetti sekä reliabiliteetti. Työn empiirinen osatoteutettiin internet-kyselynä aikuisopiskelijoiden keskuudessa. Aineiston analysoinnissa käytettiin pääkomponenttianalyysiä sekä konfirmatorista faktorianalyysiä. Institutionaalinen luottamus muodostuu kahdesta ulottuvuudesta: kyvykkyys ja oikeudenmukaisuus. Kyvykkyys muodostuu viidestä alakomponentista: operatiivisen toiminnan organisointi, organisaation pysyvyys, kyvykkyys liiketoiminnan ja ihmisten johtamisessa, teknologinen luotettavuus sekä kilpailukyky. Oikeudenmukaisuus puolestaan muodostuu HRM-käytännöistä, organisaatiossa vallitsevasta reilun pelin hengestä sekä kommunikaatiosta. Lopullinen mittari kyvykkyydelle käsittää 18 väittämää ja oikeudenmukaisuudelle 13 väittämää. Työssä kehitetty mittari mahdollistaa organisatorisen luottamuksen entistä paremman ja luotettavamman mittaamisen. Tutkijan tietämyksen mukaan tämä onensimmäinen kokonaisvaltainen mittari institutionaalisen luottamuksen mittaamiseksi.
Resumo:
Technological progress has made a huge amount of data available at increasing spatial and spectral resolutions. Therefore, the compression of hyperspectral data is an area of active research. In somefields, the original quality of a hyperspectral image cannot be compromised andin these cases, lossless compression is mandatory. The main goal of this thesisis to provide improved methods for the lossless compression of hyperspectral images. Both prediction- and transform-based methods are studied. Two kinds of prediction based methods are being studied. In the first method the spectra of a hyperspectral image are first clustered and and an optimized linear predictor is calculated for each cluster. In the second prediction method linear prediction coefficients are not fixed but are recalculated for each pixel. A parallel implementation of the above-mentioned linear prediction method is also presented. Also,two transform-based methods are being presented. Vector Quantization (VQ) was used together with a new coding of the residual image. In addition we have developed a new back end for a compression method utilizing Principal Component Analysis (PCA) and Integer Wavelet Transform (IWT). The performance of the compressionmethods are compared to that of other compression methods. The results show that the proposed linear prediction methods outperform the previous methods. In addition, a novel fast exact nearest-neighbor search method is developed. The search method is used to speed up the Linde-Buzo-Gray (LBG) clustering method.
Resumo:
Hoitotyön laatu - lasten näkökulma Tämän kolmivaiheisen tutkimuksen tarkoituksena oli kuvailla lasten odotuksia ja arviointeja lasten hoitotyön laadusta sekä kehittää mittari kouluikäisille sairaalassa oleville lapsille laadun arviointiin. Perimmäisenä tavoitteena oli lasten hoitotyön laadun kehittäminen sairaalassa. Ensimmäisessä vaiheessa 20 alle kouluikäistä (4-6v) sekä 20 kouluikäistä (7-11v) lasta kuvailivat odotuksiaan lasten hoitotyön laadusta. Aineisto kerättiin haastattelulla ja lasten piirustusten avulla, sekä analysoitiin sisällön analyysilla. Lasten odotukset lasten hoitotyön laadusta kohdistuivat hoitajaan, hoitotyön toimintoihin ja ympäristöön, fyysinen ympäristö korostui piirustuksissa. Ensimmäisen vaiheen tulosten, aikaisemman kirjallisuuden sekä Leino-Kilven “HYVÄ HOITO” mittarin pohjalta kehitettiin “Lasten Hoidon Laatu Sairaalassa” (LHLS) mittari ja testattiin sen psykometrisiä ominaisuuksia tutkimuksen toisessa vaiheessa. Mittaria kehitettiin ja testattiin kolmen vaiheen kautta. Aluksi asiantuntijapaneeli (n=7) arvioi mittarin sisältöä. Seuraavaksi mittari esitestattiin kahdesti kouluikäisillä sairaalassa olevilla lapsilla (n=41 ja n=16), samassa vaiheessa myös viiden lastenosaston hoitajat (n=19) yhdessä arvioivat mittarin sisältöä sekä 8 lasta. Lopuksi mittaria testattiin kouluikäisillä lapsilla (n=388) sairaalassa sekä hoitajat (n=198) arvioivat mittarin sisällön validiteettia. Mittarin kehittämisen aikana päälaatuluokkien: hoitajan ominaisuudet, hoitotyön toiminnot ja hoitotyön ympäristö Cronbachin alfa kertoimet paranivat. Pääkomponentti analyysi tuki mittarin hoitotyön toimintojen ja ympäristön alaluokkien teoreettista rakennetta. Kolmannessa vaiheessa “Lasten Hoidon Laatu Sairaalassa” (LHLS III, versio neljä) mittarilla kerättiin aineisto Suomen yliopistosairaaloiden lastenosastoilta kouluikäisiltä 7-11 -vuotiailta lapsilta (n=388). Mittarin lopussa lapsia pyydettiin lisäksi kuvailemaan kivointa ja ikävintä kokemustaan sairaalahoidon aikana lauseen täydennystehtävänä. Aineisto analysoitiin tilastollisesti sekä sisällön analyysilla. Lapset arvioivat fyysisen hoitoympäristön, hoitajien inhimillisyyden ja luotettavuuden sekä huolenpidon ja vuorovaikutustoiminnot kiitettäviksi. Lapset arvioivat hoitajien viihdyttämistoiminnot kaikkein alhaisimmiksi. Lapsen ikä ja sairaalantulotapa olivat yhteydessä lasten saamaan tiedon määrään. Lasten kivoimmat kokemukset liittyivät ihmisiin ja heidän ominaisuuksiinsa, toimintoihin, ympäristöön sekä lopputuloksiin. Ikävimmät kokemukset liittyivät potilaana oloon, tuntemuksiin sairauden oireista sekä erossaoloon, hoitotyön fyysisiin toimintoihin sekä ympäristöön. Tutkimuksen tulokset osoittavat lasten olevan kykeneviä arvioimaan omaa hoitoaan ja heidän näkökulmansa tulisi nähdä osana koko laadun kehittämisprosessia parannettaessa laatua käytännössä todella lapsilähtöisemmällä lähestymistavalla. “Lasten Hoidon Laatu Sairaalassa” (LHLS) mittari on mahdollinen väline saada tietoa lasten arvioinneista lasten hoitotyön laadusta, mutta mittarin testaamista tulisi jatkaa tulevaisuudessa
Resumo:
Recent years have produced great advances in the instrumentation technology. The amount of available data has been increasing due to the simplicity, speed and accuracy of current spectroscopic instruments. Most of these data are, however, meaningless without a proper analysis. This has been one of the reasons for the overgrowing success of multivariate handling of such data. Industrial data is commonly not designed data; in other words, there is no exact experimental design, but rather the data have been collected as a routine procedure during an industrial process. This makes certain demands on the multivariate modeling, as the selection of samples and variables can have an enormous effect. Common approaches in the modeling of industrial data are PCA (principal component analysis) and PLS (projection to latent structures or partial least squares) but there are also other methods that should be considered. The more advanced methods include multi block modeling and nonlinear modeling. In this thesis it is shown that the results of data analysis vary according to the modeling approach used, thus making the selection of the modeling approach dependent on the purpose of the model. If the model is intended to provide accurate predictions, the approach should be different than in the case where the purpose of modeling is mostly to obtain information about the variables and the process. For industrial applicability it is essential that the methods are robust and sufficiently simple to apply. In this way the methods and the results can be compared and an approach selected that is suitable for the intended purpose. Differences in data analysis methods are compared with data from different fields of industry in this thesis. In the first two papers, the multi block method is considered for data originating from the oil and fertilizer industries. The results are compared to those from PLS and priority PLS. The third paper considers applicability of multivariate models to process control for a reactive crystallization process. In the fourth paper, nonlinear modeling is examined with a data set from the oil industry. The response has a nonlinear relation to the descriptor matrix, and the results are compared between linear modeling, polynomial PLS and nonlinear modeling using nonlinear score vectors.
Resumo:
The uncertainty of any analytical determination depends on analysis and sampling. Uncertainty arising from sampling is usually not controlled and methods for its evaluation are still little known. Pierre Gy’s sampling theory is currently the most complete theory about samplingwhich also takes the design of the sampling equipment into account. Guides dealing with the practical issues of sampling also exist, published by international organizations such as EURACHEM, IUPAC (International Union of Pure and Applied Chemistry) and ISO (International Organization for Standardization). In this work Gy’s sampling theory was applied to several cases, including the analysis of chromite concentration estimated on SEM (Scanning Electron Microscope) images and estimation of the total uncertainty of a drug dissolution procedure. The results clearly show that Gy’s sampling theory can be utilized in both of the above-mentioned cases and that the uncertainties achieved are reliable. Variographic experiments introduced in Gy’s sampling theory are beneficially applied in analyzing the uncertainty of auto-correlated data sets such as industrial process data and environmental discharges. The periodic behaviour of these kinds of processes can be observed by variographic analysis as well as with fast Fourier transformation and auto-correlation functions. With variographic analysis, the uncertainties are estimated as a function of the sampling interval. This is advantageous when environmental data or process data are analyzed as it can be easily estimated how the sampling interval is affecting the overall uncertainty. If the sampling frequency is too high, unnecessary resources will be used. On the other hand, if a frequency is too low, the uncertainty of the determination may be unacceptably high. Variographic methods can also be utilized to estimate the uncertainty of spectral data produced by modern instruments. Since spectral data are multivariate, methods such as Principal Component Analysis (PCA) are needed when the data are analyzed. Optimization of a sampling plan increases the reliability of the analytical process which might at the end have beneficial effects on the economics of chemical analysis,
Resumo:
En del av de intressantaste fenomenen inom dagens materialfysik uppstår ur ett intrikat samspel mellan myriader av elektroner. Högtemperatursupraledare är det mest berömda exemplet. Varken klassiska teorier eller modeller där elektronerna är oberoende av varandra kan förklara de häpnadsväckande effekterna i de starkt korrelerade elektronsystemen. I vissa kopparoxider, till exempel La2CuO4, är det känt att valenselektronerna till följd av en stark ömsesidig växelverkan lokaliseras en och en till kopparatomerna i föreningens CuO2 plan. Laddningarnas inneboende magnetiska moment—spinnet—får då en avgörande roll för materialets elektriska och magnetiska egenskaper, vilka i exemplets fall kan beskrivas med Heisenbergmodellen som är den grundläggande teoretiska modellen för mikroskopisk magnetism. Men exakt varför föreningarna kan bli supraledande då de dopas med överskottsladdningar är än så länge en obesvarad fråga. Min avhandling undersöker orenheters inverkan på Heisenbergmodellens magnetiska egenskaper—ett problem av både experimentell och teoretisk relevans. En etablerad numerisk metod har använts—en kvantmekanisk Monte Carlo teknik—för att utföra omfattande datorsimuleringar av den matematiska modellen på två dedikerade Linux datorkluster. Arbetet hör till området beräkningsfysik. De teoretiska modellerna för starkt korrelerade elektronsystem, däribland Heisenbergmodellen, är ytterst invecklade matematiskt sett och de kan inte lösas exakt. Analytiska utredningar bygger för det mesta på antaganden och förenklingar vars inverkningar på slutresultatet är ofta oklara. I det avseende kan numeriska studier vara exakta, det vill säga de kan behandla modellerna som de är. Oftast behövs bägge tillvägagångssätten. Den röda tråden i arbetet har varit att numeriskt testa vissa högaktuella analytiska förutsägelser rörande effekterna av orenheter i Heisenbergmodellen. En del av dem har vi på basen av mycket noggranna data kunnat bekräfta. Men våra resultat har också påvisat felaktigheter i de analytiska prognoserna som sedermera delvis reviderats. En del av avhandlingens numeriska upptäckter har i sin tur stimulerat till helt nya teoretiska studier.
Resumo:
Early identification of beginning readers at risk of developing reading and writing difficulties plays an important role in the prevention and provision of appropriate intervention. In Tanzania, as in other countries, there are children in schools who are at risk of developing reading and writing difficulties. Many of these children complete school without being identified and without proper and relevant support. The main language in Tanzania is Kiswahili, a transparent language. Contextually relevant, reliable and valid instruments of identification are needed in Tanzanian schools. This study aimed at the construction and validation of a group-based screening instrument in the Kiswahili language for identifying beginning readers at risk of reading and writing difficulties. In studying the function of the test there was special interest in analyzing the explanatory power of certain contextual factors related to the home and school. Halfway through grade one, 337 children from four purposively selected primary schools in Morogoro municipality were screened with a group test consisting of 7 subscales measuring phonological awareness, word and letter knowledge and spelling. A questionnaire about background factors and the home and school environments related to literacy was also used. The schools were chosen based on performance status (i.e. high, good, average and low performing schools) in order to include variation. For validation, 64 children were chosen from the original sample to take an individual test measuring nonsense word reading, word reading, actual text reading, one-minute reading and writing. School marks from grade one and a follow-up test half way through grade two were also used for validation. The correlations between the results from the group test and the three measures used for validation were very high (.83-.95). Content validity of the group test was established by using items drawn from authorized text books for reading in grade one. Construct validity was analyzed through item analysis and principal component analysis. The difficulty level of most items in both the group test and the follow-up test was good. The items also discriminated well. Principal component analysis revealed one powerful latent dimension (initial literacy factor), accounting for 93% of the variance. This implies that it could be possible to use any set of the subtests of the group test for screening and prediction. The K-Means cluster analysis revealed four clusters: at-risk children, strugglers, readers and good readers. The main concern in this study was with the groups of at-risk children (24%) and strugglers (22%), who need the most assistance. The predictive validity of the group test was analyzed by correlating the measures from the two school years and by cross tabulating grade one and grade two clusters. All the correlations were positive and very high, and 94% of the at-risk children in grade two were already identified in the group test in grade one. The explanatory power of some of the home and school factors was very strong. The number of books at home accounted for 38% of the variance in reading and writing ability measured by the group test. Parents´ reading ability and the support children received at home for schoolwork were also influential factors. Among the studied school factors school attendance had the strongest explanatory power, accounting for 21% of the variance in reading and writing ability. Having been in nursery school was also of importance. Based on the findings in the study a short version of the group test was created. It is suggested for use in the screening processes in grade one aiming at identifying children at risk of reading and writing difficulties in the Tanzanian context. Suggestions for further research as well as for actions for improving the literacy skills of Tanzanian children are presented.
Resumo:
This work is devoted to the analysis of signal variation of the Cross-Direction and Machine-Direction measurements from paper web. The data that we possess comes from the real paper machine. Goal of the work is to reconstruct the basis weight structure of the paper and to predict its behaviour to the future. The resulting synthetic data is needed for simulation of paper web. The main idea that we used for describing the basis weight variation in the Cross-Direction is Empirical Orthogonal Functions (EOF) algorithm, which is closely related to Principal Component Analysis (PCA) method. Signal forecasting in time is based on Time-Series analysis. Two principal mathematical procedures that we used in the work are Autoregressive-Moving Average (ARMA) modelling and Ornstein–Uhlenbeck (OU) process.
Resumo:
In this thesis, a classi cation problem in predicting credit worthiness of a customer is tackled. This is done by proposing a reliable classi cation procedure on a given data set. The aim of this thesis is to design a model that gives the best classi cation accuracy to e ectively predict bankruptcy. FRPCA techniques proposed by Yang and Wang have been preferred since they are tolerant to certain type of noise in the data. These include FRPCA1, FRPCA2 and FRPCA3 from which the best method is chosen. Two di erent approaches are used at the classi cation stage: Similarity classi er and FKNN classi er. Algorithms are tested with Australian credit card screening data set. Results obtained indicate a mean classi cation accuracy of 83.22% using FRPCA1 with similarity classi- er. The FKNN approach yields a mean classi cation accuracy of 85.93% when used with FRPCA2, making it a better method for the suitable choices of the number of nearest neighbors and fuzziness parameters. Details on the calibration of the fuzziness parameter and other parameters associated with the similarity classi er are discussed.