894 resultados para nonparametric data, self organising maps, Australia, Queensland, subtropical, coastal catchment
Resumo:
In the past decade, the amount of data in biological field has become larger and larger; Bio-techniques for analysis of biological data have been developed and new tools have been introduced. Several computational methods are based on unsupervised neural network algorithms that are widely used for multiple purposes including clustering and visualization, i.e. the Self Organizing Maps (SOM). Unfortunately, even though this method is unsupervised, the performances in terms of quality of result and learning speed are strongly dependent from the neuron weights initialization. In this paper we present a new initialization technique based on a totally connected undirected graph, that report relations among some intersting features of data input. Result of experimental tests, where the proposed algorithm is compared to the original initialization techniques, shows that our technique assures faster learning and better performance in terms of quantization error.
Resumo:
In Information Visualization, adding and removing data elements can strongly impact the underlying visual space. We have developed an inherently incremental technique (incBoard) that maintains a coherent disposition of elements from a dynamic multidimensional data set on a 2D grid as the set changes. Here, we introduce a novel layout that uses pairwise similarity from grid neighbors, as defined in incBoard, to reposition elements on the visual space, free from constraints imposed by the grid. The board continues to be updated and can be displayed alongside the new space. As similar items are placed together, while dissimilar neighbors are moved apart, it supports users in the identification of clusters and subsets of related elements. Densely populated areas identified in the incSpace can be efficiently explored with the corresponding incBoard visualization, which is not susceptible to occlusion. The solution remains inherently incremental and maintains a coherent disposition of elements, even for fully renewed sets. The algorithm considers relative positions for the initial placement of elements, and raw dissimilarity to fine tune the visualization. It has low computational cost, with complexity depending only on the size of the currently viewed subset, V. Thus, a data set of size N can be sequentially displayed in O(N) time, reaching O(N (2)) only if the complete set is simultaneously displayed.
Resumo:
In soil surveys, several sampling systems can be used to define the most representative sites for sample collection and description of soil profiles. In recent years, the conditioned Latin hypercube sampling system has gained prominence for soil surveys. In Brazil, most of the soil maps are at small scales and in paper format, which hinders their refinement. The objectives of this work include: (i) to compare two sampling systems by conditioned Latin hypercube to map soil classes and soil properties; (II) to retrieve information from a detailed scale soil map of a pilot watershed for its refinement, comparing two data mining tools, and validation of the new soil map; and (III) to create and validate a soil map of a much larger and similar area from the extrapolation of information extracted from the existing soil map. Two sampling systems were created by conditioned Latin hypercube and by the cost-constrained conditioned Latin hypercube. At each prospection place, soil classification and measurement of the A horizon thickness were performed. Maps were generated and validated for each sampling system, comparing the efficiency of these methods. The conditioned Latin hypercube captured greater variability of soils and properties than the cost-constrained conditioned Latin hypercube, despite the former provided greater difficulty in field work. The conditioned Latin hypercube can capture greater soil variability and the cost-constrained conditioned Latin hypercube presents great potential for use in soil surveys, especially in areas of difficult access. From an existing detailed scale soil map of a pilot watershed, topographical information for each soil class was extracted from a Digital Elevation Model and its derivatives, by two data mining tools. Maps were generated using each tool. The more accurate of these tools was used for extrapolation of soil information for a much larger and similar area and the generated map was validated. It was possible to retrieve the existing soil map information and apply it on a larger area containing similar soil forming factors, at much low financial cost. The KnowledgeMiner tool for data mining, and ArcSIE, used to create the soil map, presented better results and enabled the use of existing soil map to extract soil information and its application in similar larger areas at reduced costs, which is especially important in development countries with limited financial resources for such activities, such as Brazil.
Resumo:
Over the last ten years, Salamanca has been considered among the most polluted cities in México. This paper presents a Self-Organizing Maps (SOM) Neural Network application to classify pollution data and automatize the air pollution level determination for Sulphur Dioxide (SO2) in Salamanca. Meteorological parameters are well known to be important factors contributing to air quality estimation and prediction. In order to observe the behavior and clarify the influence of wind parameters on the SO2 concentrations a SOM Neural Network have been implemented along a year. The main advantages of the SOM is that it allows to integrate data from different sensors and provide readily interpretation results. Especially, it is powerful mapping and classification tool, which others information in an easier way and facilitates the task of establishing an order of priority between the distinguished groups of concentrations depending on their need for further research or remediation actions in subsequent management steps. The results show a significative correlation between pollutant concentrations and some environmental variables.
Resumo:
In the last years significant efforts have been devoted to the development of advanced data analysis tools to both predict the occurrence of disruptions and to investigate the operational spaces of devices, with the long term goal of advancing the understanding of the physics of these events and to prepare for ITER. On JET the latest generation of the disruption predictor called APODIS has been deployed in the real time network during the last campaigns with the new metallic wall. Even if it was trained only with discharges with the carbon wall, it has reached very good performance, with both missed alarms and false alarms in the order of a few percent (and strategies to improve the performance have already been identified). Since for the optimisation of the mitigation measures, predicting also the type of disruption is considered to be also very important, a new clustering method, based on the geodesic distance on a probabilistic manifold, has been developed. This technique allows automatic classification of an incoming disruption with a success rate of better than 85%. Various other manifold learning tools, particularly Principal Component Analysis and Self Organised Maps, are also producing very interesting results in the comparative analysis of JET and ASDEX Upgrade (AUG) operational spaces, on the route to developing predictors capable of extrapolating from one device to another.
Resumo:
We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.
Resumo:
The effects of harvesting of callianassid shrimp (Trypaea australiensis) on the abundance and composition of macrobenthic assemblages in unvegetated sediments of a subtropical coastal embayment in Queensland, Australia were examined using a combination of sampling and manipulative experiments. First, the abundance and composition of the benthic infauna in an area regularly used for the collection of shrimp for bait by recreational anglers was compared with multiple reference areas. Second, a BACI design, with multiple reference areas, was used to examine the short-term effects of harvesting on the benthic assemblages from an intensive commercialised fishing competition. Third, a large-scale, controlled manipulative experiment, where shrimp were harvested from 10,000 m(2) plots at intensities commensurate with those from recreational and commercial operators, was done to determine the impacts on different components of the infaunal assemblage. Only a few benthic taxa showed significant declines in abundance in response to the removal of ghost shrimp from the unvegetated sediments. There was evidence, however, of more subtle effects with changes in the degree of spatial variation (patchiness) of several taxa as a result of harvesting.. Groups such as capitellid polychaetes, gammarid amphipods and some bivalves were significantly more patchy in their distribution in areas subjected to harvesting than reference areas, at a scale of tens of metres. This scale corresponds to the patterns of movement and activity of recreational harvesters working in these areas. In contrast, patchiness in the abundance of ghost shrimp decreased significantly under harvesting at scales of hundreds of metres, in response to harvesters focussing their efforts on areas with greater numbers of burrow entrances, leading to a more even distribution of the animals. Controlled experimental harvesting caused declines in the abundance of soldier crabs (Mictyris longicarpus), polychaetes and amphipods and an increase in the spatial patchiness of polychaetes. Populations of ghost shrimp were, however, resilient to harvesting over extended periods of time. In conclusion, harvesting of ghost shrimp for bait by recreational and commercial fishers causes significant but localised impacts on a limited range of benthic fauna in unvegetated sediments, including changes in the degree of spatial patchiness in their distribution. (c) 2005 Elsevier B.V. All rights reserved.
Resumo:
The effects of dredging on the benthic communities in the Noosa River, a subtropical estuary in SE Queensland, Australia, were examined using a 'Beyond BACF experimental design. Changes in the numbers and types of animals and characteristics of the sediments in response to dredging in the coarse sandy sediments near the mouth of the estuary were compared with those occurring naturally in two control regions. Samples were collected twice before and twice after the dredging operations, at multiple spatial scales, ranging from metres to kilometres. Significant effects from the dredging were detected on the abundance of some polychaetes and bivalves and two measures of diversity (numbers of polychaete families and total taxonomic richness). In addition, the dredging caused a significant increase in the diversity of sediment particle sizes found in the dredged region compared with elsewhere. Community composition in the dredged region was more similar to that in the control regions after dredging than before. Changes in the characteristics of the sedimentary environment as a result of the dredging appeared to lead to the benthic communities of the dredged region becoming more similar to those elsewhere in the estuary, so dredging in this system may have led to the loss or reduction in area of a specific type of habitat in the estuary with implications for overall patterns of biodiversity and ecosystem function. (c) 2006 Elsevier B.V. All rights reserved.
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
Data visualization algorithms and feature selection techniques are both widely used in bioinformatics but as distinct analytical approaches. Until now there has been no method of measuring feature saliency while training a data visualization model. We derive a generative topographic mapping (GTM) based data visualization approach which estimates feature saliency simultaneously with the training of the visualization model. The approach not only provides a better projection by modeling irrelevant features with a separate noise model but also gives feature saliency values which help the user to assess the significance of each feature. We compare the quality of projection obtained using the new approach with the projections from traditional GTM and self-organizing maps (SOM) algorithms. The results obtained on a synthetic and a real-life chemoinformatics dataset demonstrate that the proposed approach successfully identifies feature significance and provides coherent (compact) projections. © 2006 IEEE.
Resumo:
In the primary visual cortex, neurons with similar physiological features are clustered together in columns extending through all six cortical layers. These columns form modular orientation preference maps. Long-range lateral fibers are associated to the structure of orientation maps since they do not connect columns randomly; they rather cluster in regular intervals and interconnect predominantly columns of neurons responding to similar stimulus features. Single orientation preference maps – the joint activation of domains preferring the same orientation - were observed to emerge spontaneously and it was speculated whether this structured ongoing activation could be caused by the underlying patchy lateral connectivity. Since long-range lateral connections share many features, i.e. clustering, orientation selectivity, with visual inter-hemispheric connections (VIC) through the corpus callosum we used the latter as a model for long-range lateral connectivity. In order to address the question of how the lateral connectivity contributes to spontaneously generated maps of one hemisphere we investigated how these maps react to the deactivation of VICs originating from the contralateral hemisphere. To this end, we performed experiments in eight adult cats. We recorded voltage-sensitive dye (VSD) imaging and electrophysiological spiking activity in one brain hemisphere while reversible deactivating the other hemisphere with a cooling technique. In order to compare ongoing activity with evoked activity patterns we first presented oriented gratings as visual stimuli. Gratings had 8 different orientations distributed equally between 0º and 180º. VSD imaged frames obtained during ongoing activity conditions were then compared to the averaged evoked single orientation maps in three different states: baseline, cooling and recovery. Kohonen self-organizing maps were also used as a means of analysis without prior assumption (like the averaged single condition maps) on ongoing activity. We also evaluated if cooling had a differential effect on evoked and ongoing spiking activity of single units. We found that deactivating VICs caused no spatial disruption on the structure of either evoked or ongoing activity maps. The frequency with which a cardinally preferring (0º or 90º) map would emerge, however, decreased significantly for ongoing but not for evoked activity. The same result was found by training self-organizing maps with recorded data as input. Spiking activity of cardinally preferring units also decreased significantly for ongoing when compared to evoked activity. Based on our results we came to the following conclusions: 1) VICs are not a determinant factor of ongoing map structure. Maps continued to be spontaneously generated with the same quality, probably by a combination of ongoing activity from local recurrent connections, thalamocortical loop and feedback connections. 2) VICs account for a cardinal bias in the temporal sequence of ongoing activity patterns, i.e. deactivating VIC decreases the probability of cardinal maps to emerge spontaneously. 3) Inter- and intrahemispheric long-range connections might serve as a grid preparing primary visual cortex for likely junctions in a larger visual environment encompassing the two hemifields.