15 resultados para Discriminant analysis
em Aston University Research Archive
Resumo:
Discriminant analysis (also known as discriminant function analysis or multiple discriminant analysis) is a multivariate statistical method of testing the degree to which two or more populations may overlap with each other. It was devised independently by several statisticians including Fisher, Mahalanobis, and Hotelling ). The technique has several possible applications in Microbiology. First, in a clinical microbiological setting, if two different infectious diseases were defined by a number of clinical and pathological variables, it may be useful to decide which measurements were the most effective at distinguishing between the two diseases. Second, in an environmental microbiological setting, the technique could be used to study the relationships between different populations, e.g., to what extent do the properties of soils in which the bacterium Azotobacter is found differ from those in which it is absent? Third, the method can be used as a multivariate ‘t’ test , i.e., given a number of related measurements on two groups, the analysis can provide a single test of the hypothesis that the two populations have the same means for all the variables studied. This statnote describes one of the most popular applications of discriminant analysis in identifying the descriptive variables that can distinguish between two populations.
Resumo:
The accurate in silico identification of T-cell epitopes is a critical step in the development of peptide-based vaccines, reagents, and diagnostics. It has a direct impact on the success of subsequent experimental work. Epitopes arise as a consequence of complex proteolytic processing within the cell. Prior to being recognized by T cells, an epitope is presented on the cell surface as a complex with a major histocompatibility complex (MHC) protein. A prerequisite therefore for T-cell recognition is that an epitope is also a good MHC binder. Thus, T-cell epitope prediction overlaps strongly with the prediction of MHC binding. In the present study, we compare discriminant analysis and multiple linear regression as algorithmic engines for the definition of quantitative matrices for binding affinity prediction. We apply these methods to peptides which bind the well-studied human MHC allele HLA-A*0201. A matrix which results from combining results of the two methods proved powerfully predictive under cross-validation. The new matrix was also tested on an external set of 160 binders to HLA-A*0201; it was able to recognize 135 (84%) of them.
Resumo:
Most existing color-based tracking algorithms utilize the statistical color information of the object as the tracking clues, without maintaining the spatial structure within a single chromatic image. Recently, the researches on the multilinear algebra provide the possibility to hold the spatial structural relationship in a representation of the image ensembles. In this paper, a third-order color tensor is constructed to represent the object to be tracked. Considering the influence of the environment changing on the tracking, the biased discriminant analysis (BDA) is extended to the tensor biased discriminant analysis (TBDA) for distinguishing the object from the background. At the same time, an incremental scheme for the TBDA is developed for the tensor biased discriminant subspace online learning, which can be used to adapt to the appearance variant of both the object and background. The experimental results show that the proposed method can track objects precisely undergoing large pose, scale and lighting changes, as well as partial occlusion. © 2009 Elsevier B.V.
Resumo:
This thesis describes the development of a complete data visualisation system for large tabular databases, such as those commonly found in a business environment. A state-of-the-art 'cyberspace cell' data visualisation technique was investigated and a powerful visualisation system using it was implemented. Although allowing databases to be explored and conclusions drawn, it had several drawbacks, the majority of which were due to the three-dimensional nature of the visualisation. A novel two-dimensional generic visualisation system, known as MADEN, was then developed and implemented, based upon a 2-D matrix of 'density plots'. MADEN allows an entire high-dimensional database to be visualised in one window, while permitting close analysis in 'enlargement' windows. Selections of records can be made and examined, and dependencies between fields can be investigated in detail. MADEN was used as a tool for investigating and assessing many data processing algorithms, firstly data-reducing (clustering) methods, then dimensionality-reducing techniques. These included a new 'directed' form of principal components analysis, several novel applications of artificial neural networks, and discriminant analysis techniques which illustrated how groups within a database can be separated. To illustrate the power of the system, MADEN was used to explore customer databases from two financial institutions, resulting in a number of discoveries which would be of interest to a marketing manager. Finally, the database of results from the 1992 UK Research Assessment Exercise was analysed. Using MADEN allowed both universities and disciplines to be graphically compared, and supplied some startling revelations, including empirical evidence of the 'Oxbridge factor'.
Resumo:
The thesis began as a study of new firm formation. Preliminary research suggested that infant death rate was considered to be a closely related problem and the search was for a theory of new firm formation which would explain both. The thesis finds theories of exit and entry inadequate in this respect and focusses instead on theories of entrepreneurship, particularly those which concentrate on entrepreneurship as an agent of change. The role of information is found to be fundamental to economic change and an understanding of information generation and dissemination and the nature and direction of information flows is postulated to lead coterminously to an understanding of entrepreneurhsip and economic change. The economics of information is applied to theories of entrepreneurhsip and some testable hypotheses are derived. The testing relies on etablishing and measuring the information bases of the founders of new firms and then testing for certain hypothesised differences between the information bases of survivors and non-survivors. No theory of entrepreneurship is likely to be straightforwardly testable and many postulates have to be established to bring the theory to a testable stage. A questionnaire is used to gather information from a sample of firms taken from a new micro-data set established as part of the work of the thesis. Discriminant Analysis establishes the variables which best distinguish between survivors and non-survivors. The variables which emerge as important discriminators are consistent with the theory which the analysis is testing. While there are alternative interpretations of the important variables, collective consistency with the theory under test is established. The thesis concludes with an examination of the implications of the theory for policy towards stimulating new firm formation.
Resumo:
Since the Second World War a range of policies have been implemented by central and local government agencies, with a view to improving accessibility to facilities, housing and employment opportunities within rural areas. It has been suggested that a lack of reasonable access to a range of such facilities and opportunities constitutes a key aspect of deprivation or disadvantage for rural residents. Despite considerable interest, very few attempts have been made to assess the nature and incidence of this disadvantage or the reaction of different sections of the population of rural areas to it. Moreover, almost all previous assessments have relied on so-called 'objective' measures of accessibility and disadvantage and failed to consider the relationship between such measures and 'subjective' measures such as individual perceptions. It is this gap in knowledge that the research described in this thesis has addressed. Following a critical review of relevant literature the thesis describes the way in which data on 'objective' and 'subjective' indicators of accessibility and behavioural responses to accessibility problems was collected, in six case study areas in Shropshire. Analysis of this data indicates that planning and other government policies have failed to significantly improve rural resident's accessibility to their basic requirements, and may in some cases have exacerbated it, and that as a result certain sections of the rural population are relatively disadvantaged. Moreover, analysis shows that .certain aspects of individual subjective' assessments of such accessibility disadvantage are significantly associated with more easily-obtained 'objective' measures. By using discriminant analysis the research demonstrates that it is possible to predict the likely levels of satisfaction with access to facilities from a range of 'objective' measures. The research concludes by highlighting the potential practical applications of such indicators in policy formulation, policy appraisal and policy evaluation.
Resumo:
Decomposition of domestic wastes in an anaerobic environment results in the production of landfill gas. Public concern about landfill disposal and particularly the production of landfill gas has been heightened over the past decade. This has been due in large to the increased quantities of gas being generated as a result of modern disposal techniques, and also to their increasing effect on modern urban developments. In order to avert diasters, effective means of preventing gas migration are required. This, in turn requires accurate detection and monitoring of gas in the subsurface. Point sampling techniques have many drawbacks, and accurate measurement of gas is difficult. Some of the disadvantages of these techniques could be overcome by assessing the impact of gas on biological systems. This research explores the effects of landfill gas on plants, and hence on the spectral response of vegetation canopies. Examination of the landfill gas/vegetation relationship is covered, both by review of the literature and statistical analysis of field data. The work showed that, although vegetation health was related to landfill gas, it was not possible to define a simple correlation. In the landfill environment, contribution from other variables, such as soil characteristics, frequently confused the relationship. Two sites are investigated in detail, the sites contrasting in terms of the data available, site conditions, and the degree of damage to vegetation. Gas migration at the Panshanger site was dominantly upwards, affecting crops being grown on the landfill cap. The injury was expressed as an overall decline in plant health. Discriminant analysis was used to account for the variations in plant health, and hence the differences in spectral response of the crop canopy, using a combination of soil and gas variables. Damage to both woodland and crops at the Ware site was severe, and could be easily related to the presence of gas. Air photographs, aerial video, and airborne thematic mapper data were used to identify damage to vegetation, and relate this to soil type. The utility of different sensors for this type of application is assessed, and possible improvements that could lead to more widespread use are identified. The situations in which remote sensing data could be combined with ground survey are identified. In addition, a possible methodology for integrating the two approaches is suggested.
Resumo:
With business incubators deemed as a potent infrastructural element for entrepreneurship development, business incubation management practice and performance have received widespread attention. However, despite this surge of interest, scholars have questioned the extent to which business incubation delivers added value. Thus, there is a growing awareness among researchers, practitioners and policy makers of the need for more rigorous evaluation of the business incubation output performance. Aligned to this is an increasing demand for benchmarking business incubation input/process performance and highlighting best practice. This paper offers a business incubation assessment framework, which considers input/process and output performance domains with relevant indicators. This tool adds value on different levels. It has been developed in collaboration with practitioners and industry experts and therefore it would be relevant and useful to business incubation managers. Once a large enough database of completed questionnaires has been populated on an online platform managed by a coordinating mechanism, such as a business incubation membership association, business incubator managers can reflect on their practices by using this assessment framework to learn their relative position vis-à-vis their peers against each domain. This will enable them to align with best practice in this field. Beyond implications for business incubation management practice, this performance assessment framework would also be useful to researchers and policy makers concerned with business incubation management practice and impact. Future large-scale research could test for construct validity and reliability. Also, discriminant analysis could help link input and process indicators with output measures.
Resumo:
Background: Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences.Results: A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity.Conclusions: AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin. © 2013 Dimitrov et al.; licensee BioMed Central Ltd.
Resumo:
Circulating low density lipoproteins (LDL) are thought to play a crucial role in the onset and development of atherosclerosis, though the detailed molecular mechanisms responsible for their biological effects remain controversial. The complexity of biomolecules (lipids, glycans and protein) and structural features (isoforms and chemical modifications) found in LDL particles hampers the complete understanding of the mechanism underlying its atherogenicity. For this reason the screening of LDL for features discriminative of a particular pathology in search of biomarkers is of high importance. Three major biomolecule classes (lipids, protein and glycans) in LDL particles were screened using mass spectrometry coupled to liquid chromatography. Dual-polarity screening resulted in good lipidome coverage, identifying over 300 lipid species from 12 lipid sub-classes. Multivariate analysis was used to investigate potential discriminators in the individual lipid sub-classes for different study groups (age, gender, pathology). Additionally, the high protein sequence coverage of ApoB-100 routinely achieved (≥70%) assisted in the search for protein modifications correlating to aging and pathology. The large size and complexity of the datasets required the use of chemometric methods (Partial Least Square-Discriminant Analysis, PLS-DA) for their analysis and for the identification of ions that discriminate between study groups. The peptide profile from enzymatically digested ApoB-100 can be correlated with the high structural complexity of lipids associated with ApoB-100 using exploratory data analysis. In addition, using targeted scanning modes, glycosylation sites within neutral and acidic sugar residues in ApoB-100 are also being explored. Together or individually, knowledge of the profiles and modifications of the major biomolecules in LDL particles will contribute towards an in-depth understanding, will help to map the structural features that contribute to the atherogenicity of LDL, and may allow identification of reliable, pathology-specific biomarkers. This research was supported by a Marie Curie Intra-European Fellowship within the 7th European Community Framework Program (IEF 255076). Work of A. Rudnitskaya was supported by Portuguese Science and Technology Foundation, through the European Social Fund (ESF) and "Programa Operacional Potencial Humano - POPH".
Resumo:
Growth in availability and ability of modern statistical software has resulted in greater numbers of research techniques being applied across the marketing discipline. However, with such advances come concerns that techniques may be misinterpreted by researchers. This issue is critical since misinterpretation could cause erroneous findings. This paper investigates some assumptions regarding: 1) the assessment of discriminant validity; and 2) what confirmatory factor analysis accomplishes. Examples that address these points are presented, and some procedural remedies are suggested based upon the literature. This paper is, therefore, primarily concerned with the development of measurement theory and practice. If advances in theory development are not based upon sound methodological practice, we as researchers could be basing our work upon shaky foundations.
Resumo:
The judicial interest in ‘scientific’ evidence has driven recent work to quantify results for forensic linguistic authorship analysis. Through a methodological discussion and a worked example this paper examines the issues which complicate attempts to quantify results in work. The solution suggested to some of the difficulties is a sampling and testing strategy which helps to identify potentially useful, valid and reliable markers of authorship. An important feature of the sampling strategy is that these markers identified as being generally valid and reliable are retested for use in specific authorship analysis cases. The suggested approach for drawing quantified conclusions combines discriminant function analysis and Bayesian likelihood measures. The worked example starts with twenty comparison texts for each of three potential authors and then uses a progressively smaller comparison corpus, reducing to fifteen, ten, five and finally three texts per author. This worked example demonstrates how reducing the amount of data affects the way conclusions can be drawn. With greater numbers of reference texts quantified and safe attributions are shown to be possible, but as the number of reference texts reduces the analysis shows how the conclusion which should be reached is that no attribution can be made. The testing process at no point results in instances of a misattribution.
Resumo:
Bove, Pervan, Beatty, and Shiu [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] develop and test a latent variable model of the role of service workers in encouraging customers' organizational citizenship behaviors. However, Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.] claim support for hypothesized relationships between constructs that, due to insufficient discriminant validity regarding certain constructs, may be inaccurate. This research comment discusses what discriminant validity represents, procedures for establishing discriminant validity, and presents an example of inaccurate discriminant validity assessment based upon the work of Bove et al. [Bove, LL, Pervan, SJ, Beatty, SE, Shiu, E. Service worker role in encouraging customer organizational citizenship behaviors. J Bus Res 2009;62(7):698–705.]. Solutions to discriminant validity problems and a five-step procedure for assessing discriminant validity then conclude the paper. This comment hopes to motivate a review of discriminant validity issues and offers assistance to future researchers conducting latent variable analysis.
Resumo:
This thesis seeks to describe the development of an inexpensive and efficient clustering technique for multivariate data analysis. The technique starts from a multivariate data matrix and ends with graphical representation of the data and pattern recognition discriminant function. The technique also results in distances frequency distribution that might be useful in detecting clustering in the data or for the estimation of parameters useful in the discrimination between the different populations in the data. The technique can also be used in feature selection. The technique is essentially for the discovery of data structure by revealing the component parts of the data. lhe thesis offers three distinct contributions for cluster analysis and pattern recognition techniques. The first contribution is the introduction of transformation function in the technique of nonlinear mapping. The second contribution is the us~ of distances frequency distribution instead of distances time-sequence in nonlinear mapping, The third contribution is the formulation of a new generalised and normalised error function together with its optimal step size formula for gradient method minimisation. The thesis consists of five chapters. The first chapter is the introduction. The second chapter describes multidimensional scaling as an origin of nonlinear mapping technique. The third chapter describes the first developing step in the technique of nonlinear mapping that is the introduction of "transformation function". The fourth chapter describes the second developing step of the nonlinear mapping technique. This is the use of distances frequency distribution instead of distances time-sequence. The chapter also includes the new generalised and normalised error function formulation. Finally, the fifth chapter, the conclusion, evaluates all developments and proposes a new program. for cluster analysis and pattern recognition by integrating all the new features.