998 resultados para CEO selection
Resumo:
Agglomerative cluster analyses encompass many techniques, which have been widely used in various fields of science. In biology, and specifically ecology, datasets are generally highly variable and may contain outliers, which increase the difficulty to identify the number of clusters. Here we present a new criterion to determine statistically the optimal level of partition in a classification tree. The criterion robustness is tested against perturbated data (outliers) using an observation or variable with values randomly generated. The technique, called Random Simulation Test (RST), is tested on (1) the well-known Iris dataset [Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Ann. Eugenic. 7, 179–188], (2) simulated data with predetermined numbers of clusters following Milligan and Cooper [Milligan, G.W., Cooper, M.C., 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika 50, 159–179] and finally (3) is applied on real copepod communities data previously analyzed in Beaugrand et al. [Beaugrand, G., Ibanez, F., Lindley, J.A., Reid, P.C., 2002. Diversity of calanoid copepods in the North Atlantic and adjacent seas: species associations and biogeography. Mar. Ecol. Prog. Ser. 232, 179–195]. The technique is compared to several standard techniques. RST performed generally better than existing algorithms on simulated data and proved to be especially efficient with highly variable datasets.
Resumo:
Macroalgae (seaweeds) are a promising feedstock for the production of third generation bioethanol, since they have high carbohydrate contents, contain little or no lignin and are available in abundance. However, seaweeds typically contain a more diverse array of monomeric sugars than are commonly present in feedstocks derived from lignocellulosic material which are currently used for bioethanol production. Hence, identification of a suitable fermentative microorganism that can utilise the principal sugars released from the hydrolysis of macroalgae remains a major objective. The present study used a phenotypic microarray technique to screen 24 different yeast strains for their ability to metabolise individual monosaccharides commonly found in seaweeds, as well as hydrolysates following an acid pre-treatment of five native UK seaweed species (Laminaria digitata, Fucus serratus, Chondrus crispus, Palmaria palmata and Ulva lactuca). Five strains of yeast (three Saccharomyces spp, one Pichia sp and one Candida sp) were selected and subsequently evaluated for bioethanol production during fermentation of the hydrolysates. Four out of the five selected strains converted these monomeric sugars into bioethanol, with the highest ethanol yield (13 g L−1) resulting from a fermentation using C. crispus hydrolysate with Saccharomyces cerevisiae YPS128. This study demonstrated the novel application of a phenotypic microarray technique to screen for yeast capable of metabolising sugars present in seaweed hydrolysates; however, metabolic activity did not always imply fermentative production of ethanol.
Resumo:
The purpose of this study is to produce a series of Conceptual Ecological Models (CEMs) that represent sublittoral rock habitats in the UK. CEMs are diagrammatic representations of the influences and processes that occur within an ecosystem. They can be used to identify critical aspects of an ecosystem that may be studied further, or serve as the basis for the selection of indicators for environmental monitoring purposes. The models produced by this project are control diagrams, representing the unimpacted state of the environment free from anthropogenic pressures. It is intended that the models produced by this project will be used to guide indicator selection for the monitoring of this habitat in UK waters. CEMs may eventually be produced for a range of habitat types defined under the UK Marine Biodiversity Monitoring R&D Programme (UKMBMP), which, along with stressor models, are designed to show the interactions within impacted habitats, would form the basis of a robust method for indicator selection. This project builds on the work to develop CEMs for shallow sublittoral coarse sediment habitats (Alexander et al 2014). The project scope included those habitats defined as ‘sublittoral rock’. This definition includes those habitats that fall into the EUNIS Level 3 classifications A3.1 Atlantic and Mediterranean high energy infralittoral rock, A3.2 Atlantic and Mediterranean moderate energy infralittoral rock, A3.3 Atlantic and Mediterranean low energy infralittoral rock, A4.1 Atlantic and Mediterranean high energy circalittoral rock, A4.2 Atlantic and Mediterranean moderate energy circalittoral rock, and A4.3 Atlantic and Mediterranean low energy circalittoral rock as well as the constituent Level 4 and 5 biotopes that are relevant to UK waters. A species list of characterising fauna to be included within the scope of the models was identified using an iterative process to refine the full list of species found within the relevant Level 5 biotopes. A literature review was conducted using a pragmatic and iterative approach to gather evidence regarding species traits and information that would be used to inform the models and characterise the interactions that occur within the sublittoral rock habitat. All information gathered during the literature review was entered into a data logging pro-forma spreadsheet that accompanies this report. Wherever possible, attempts were made to collect information from UK-specific peer-reviewed studies, although other sources were used where necessary. All data gathered was subject to a detailed confidence assessment. Expert judgement by the project team was utilised to provide information for aspects of the models for which references could not be sourced within the project timeframe. A multivariate analysis approach was adopted to assess ecologically similar groups (based on ecological and life history traits) of fauna from the identified species to form the basis of the models. A model hierarchy was developed based on these ecological groups. One general control model was produced that indicated the high-level drivers, inputs, biological assemblages, ecosystem processes and outputs that occur in sublittoral rock habitats. In addition to this, seven detailed sub-models were produced, which each focussed on a particular ecological group of fauna within the habitat: ‘macroalgae’, ‘temporarily or permanently attached active filter feeders’, ‘temporarily or permanently attached passive filter feeders’, ‘bivalves, brachiopods and other encrusting filter feeders’, ‘tube building fauna’, ‘scavengers and predatory fauna’, and ‘non-predatory mobile fauna’. Each sub-model is accompanied by an associated confidence model that presents confidence in the links between each model component. The models are split into seven levels and take spatial and temporal scale into account through their design, as well as magnitude and direction of influence. The seven levels include regional to global drivers, water column processes, local inputs/processes at the seabed, habitat and biological assemblage, output processes, local ecosystem functions, and regional to global ecosystem functions. The models indicate that whilst the high level drivers that affect each ecological group are largely similar, the output processes performed by the biota and the resulting ecosystem functions vary both in number and importance between groups. Confidence within the models as a whole is generally high, reflecting the level of information gathered during the literature review. Physical drivers which influence the ecosystem were found to be of high importance for the sublittoral rock habitat, with factors such as wave exposure, water depth and water currents noted to be crucial in defining the biological assemblages. Other important factors such as recruitment/propagule supply, and those which affect primary production, such as suspended sediments, light attenuation and water chemistry and temperature, were also noted to be key and act to influence the food sources consumed by the biological assemblages of the habitat, and the biological assemblages themselves. Output processes performed by the biological assemblages are variable between ecological groups depending on the specific flora and fauna present and the role they perform within the ecosystem. Of particular importance are the outputs performed by the macroalgae group, which are diverse in nature and exert influence over other ecological groups in the habitat. Important output processes from the habitat as a whole include primary and secondary production, bioengineering, biodeposition (in mixed sediment habitats) and the supply of propagules; these in turn influence ecosystem functions at the local scale such as nutrient and biogeochemical cycling, supply of food resources, sediment stability (in mixed sediment habitats), habitat provision and population and algae control. The export of biodiversity and organic matter, biodiversity enhancement and biotope stability are the resulting ecosystem functions that occur at the regional to global scale. Features within the models that are most useful for monitoring habitat status and change due to natural variation have been identified, as have those that may be useful for monitoring to identify anthropogenic causes of change within the ecosystem. Biological, physical and chemical features of the ecosystem have been identified as potential indicators to monitor natural variation, whereas biological factors and those physical /chemical factors most likely to affect primary production have predominantly been identified as most likely to indicate change due to anthropogenic pressures.
Resumo:
This study examines the relation between selection power and selection labor for information retrieval (IR). It is the first part of the development of a labor theoretic approach to IR. Existing models for evaluation of IR systems are reviewed and the distinction of operational from experimental systems partly dissolved. The often covert, but powerful, influence from technology on practice and theory is rendered explicit. Selection power is understood as the human ability to make informed choices between objects or representations of objects and is adopted as the primary value for IR. Selection power is conceived as a property of human consciousness, which can be assisted or frustrated by system design. The concept of selection power is further elucidated, and its value supported, by an example of the discrimination enabled by index descriptions, the discovery of analogous concepts in partly independent scholarly and wider public discourses, and its embodiment in the design and use of systems. Selection power is regarded as produced by selection labor, with the nature of that labor changing with different historical conditions and concurrent information technologies. Selection labor can itself be decomposed into description and search labor. Selection labor and its decomposition into description and search labor will be treated in a subsequent article, in a further development of a labor theoretic approach to information retrieval.
Resumo:
Feature selection and feature weighting are useful techniques for improving the classification accuracy of K-nearest-neighbor (K-NN) rule. The term feature selection refers to algorithms that select the best subset of the input feature set. In feature weighting, each feature is multiplied by a weight value proportional to the ability of the feature to distinguish pattern classes. In this paper, a novel hybrid approach is proposed for simultaneous feature selection and feature weighting of K-NN rule based on Tabu Search (TS) heuristic. The proposed TS heuristic in combination with K-NN classifier is compared with several classifiers on various available data sets. The results have indicated a significant improvement in the performance in classification accuracy. The proposed TS heuristic is also compared with various feature selection algorithms. Experiments performed revealed that the proposed hybrid TS heuristic is superior to both simple TS and sequential search algorithms. We also present results for the classification of prostate cancer using multispectral images, an important problem in biomedicine.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.
Resumo:
Clustering analysis of data from DNA microarray hybridization studies is an essential task for identifying biologically relevant groups of genes. Attribute cluster algorithm (ACA) has provided an attractive way to group and select meaningful genes. However, ACA needs much prior knowledge about the genes to set the number of clusters. In practical applications, if the number of clusters is misspecified, the performance of the ACA will deteriorate rapidly. In fact, it is a very demanding to do that because of our little knowledge. We propose the Cooperative Competition Cluster Algorithm (CCCA) in this paper. In the algorithm, we assume that both cooperation and competition exist simultaneously between clusters in the process of clustering. By using this principle of Cooperative Competition, the number of clusters can be found in the process of clustering. Experimental results on a synthetic and gene expression data are demonstrated. The results show that CCCA can choose the number of clusters automatically and get excellent performance with respect to other competing methods.