993 resultados para Scatter plot
(Figure 5) Bivariate scatter plot of magnetic properties from riverine sediments of Tauranga Harbour
Resumo:
Thesis (Master's)--University of Washington, 2016-01
Resumo:
Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.
Resumo:
Visual data mining (VDM) tools employ information visualization techniques in order to represent large amounts of high-dimensional data graphically and to involve the user in exploring data at different levels of detail. The users are looking for outliers, patterns and models – in the form of clusters, classes, trends, and relationships – in different categories of data, i.e., financial, business information, etc. The focus of this thesis is the evaluation of multidimensional visualization techniques, especially from the business user’s perspective. We address three research problems. The first problem is the evaluation of projection-based visualizations with respect to their effectiveness in preserving the original distances between data points and the clustering structure of the data. In this respect, we propose the use of existing clustering validity measures. We illustrate their usefulness in evaluating five visualization techniques: Principal Components Analysis (PCA), Sammon’s Mapping, Self-Organizing Map (SOM), Radial Coordinate Visualization and Star Coordinates. The second problem is concerned with evaluating different visualization techniques as to their effectiveness in visual data mining of business data. For this purpose, we propose an inquiry evaluation technique and conduct the evaluation of nine visualization techniques. The visualizations under evaluation are Multiple Line Graphs, Permutation Matrix, Survey Plot, Scatter Plot Matrix, Parallel Coordinates, Treemap, PCA, Sammon’s Mapping and the SOM. The third problem is the evaluation of quality of use of VDM tools. We provide a conceptual framework for evaluating the quality of use of VDM tools and apply it to the evaluation of the SOM. In the evaluation, we use an inquiry technique for which we developed a questionnaire based on the proposed framework. The contributions of the thesis consist of three new evaluation techniques and the results obtained by applying these evaluation techniques. The thesis provides a systematic approach to evaluation of various visualization techniques. In this respect, first, we performed and described the evaluations in a systematic way, highlighting the evaluation activities, and their inputs and outputs. Secondly, we integrated the evaluation studies in the broad framework of usability evaluation. The results of the evaluations are intended to help developers and researchers of visualization systems to select appropriate visualization techniques in specific situations. The results of the evaluations also contribute to the understanding of the strengths and limitations of the visualization techniques evaluated and further to the improvement of these techniques.
Resumo:
Knowledge discovery in databases is the non-trivial process of identifying valid, novel potentially useful and ultimately understandable patterns from data. The term Data mining refers to the process which does the exploratory analysis on the data and builds some model on the data. To infer patterns from data, data mining involves different approaches like association rule mining, classification techniques or clustering techniques. Among the many data mining techniques, clustering plays a major role, since it helps to group the related data for assessing properties and drawing conclusions. Most of the clustering algorithms act on a dataset with uniform format, since the similarity or dissimilarity between the data points is a significant factor in finding out the clusters. If a dataset consists of mixed attributes, i.e. a combination of numerical and categorical variables, a preferred approach is to convert different formats into a uniform format. The research study explores the various techniques to convert the mixed data sets to a numerical equivalent, so as to make it equipped for applying the statistical and similar algorithms. The results of clustering mixed category data after conversion to numeric data type have been demonstrated using a crime data set. The thesis also proposes an extension to the well known algorithm for handling mixed data types, to deal with data sets having only categorical data. The proposed conversion has been validated on a data set corresponding to breast cancer. Moreover, another issue with the clustering process is the visualization of output. Different geometric techniques like scatter plot, or projection plots are available, but none of the techniques display the result projecting the whole database but rather demonstrate attribute-pair wise analysis
Resumo:
Efficient optic disc segmentation is an important task in automated retinal screening. For the same reason optic disc detection is fundamental for medical references and is important for the retinal image analysis application. The most difficult problem of optic disc extraction is to locate the region of interest. Moreover it is a time consuming task. This paper tries to overcome this barrier by presenting an automated method for optic disc boundary extraction using Fuzzy C Means combined with thresholding. The discs determined by the new method agree relatively well with those determined by the experts. The present method has been validated on a data set of 110 colour fundus images from DRION database, and has obtained promising results. The performance of the system is evaluated using the difference in horizontal and vertical diameters of the obtained disc boundary and that of the ground truth obtained from two expert ophthalmologists. For the 25 test images selected from the 110 colour fundus images, the Pearson correlation of the ground truth diameters with the detected diameters by the new method are 0.946 and 0.958 and, 0.94 and 0.974 respectively. From the scatter plot, it is shown that the ground truth and detected diameters have a high positive correlation. This computerized analysis of optic disc is very useful for the diagnosis of retinal diseases
Resumo:
Impatiens noli-tangere is scarce in the UK and probably only native to the Lake District and Wales. It is the sole food plant for the endangered moth Eustroma reticulattum. Significant annual fluctuations in the size of I. noli-tangere populations endanger the continued presence of E. reticulatum in the UK. In this study, variation in population size was monitored across native populations of L noli-tangere in the English Lake District and Wales. In 1998, there was a crash in the population size of all metapopulations in the Lake District but not of those found in Wales. A molecular survey of the genetic affinities of samples in 1999 from both regions and a reference population from Switzerland was performed using AFLP and ISSR analyses. The consensus UPGMA dendrogram and a PCO scatter plot revealed clear differentiation between the populations of L noli-tangere in Wales and those in the Lake District. Most of the genetic variation in the UK (H-T= 0.064) was partitioned between (G(ST) = 0.455) rather than within (H-S = 0.034) regions, inferring little gene flow occurs between regions. There was similar bias towards differentiation between metapopulations in Wales, again consistent with low levels of interpopulation gene flow. This contrasts with far lower levels of differentiation in the Lake District which suggests modest rates of gene flow may occur between populations. It is concluded that in the event of local extinction of sites or populations, reintroductions should be restricted to samples collected from the same region. We then surveyed climatic variables to identify those most likely to cause local extinctions. Climatic correlates of population size were sought from two Lake District metapopulations situated close to a meteorological station. A combination of three climatic variables common to both sites explained 81-84% of the variation in plant number between 1990 and 2001. Projected trends for these climatic variables were used in a Monte Carlo simulation which suggested an increased risk of I. noli-tangere population crashes by 2050 at Coniston Water. but not at Derwentwater. Implications of these findings for practical conservation strategies are explored. (C) 2003 Elsevier Ltd. All rights reserved.
Resumo:
We use sunspot group observations from the Royal Greenwich Observatory (RGO) to investigate the effects of intercalibrating data from observers with different visual acuities. The tests are made by counting the number of groups RB above a variable cut-off threshold of observed total whole-spot area (uncorrected for foreshortening) to simulate what a lower acuity observer would have seen. The synthesised annual means of RB are then re-scaled to the full observed RGO group number RA using a variety of regression techniques. It is found that a very high correlation between RA and RB (rAB > 0.98) does not prevent large errors in the intercalibration (for example sunspot maximum values can be over 30 % too large even for such levels of rAB). In generating the backbone sunspot number (RBB), Svalgaard and Schatten (2015, this issue) force regression fits to pass through the scatter plot origin which generates unreliable fits (the residuals do not form a normal distribution) and causes sunspot cycle amplitudes to be exaggerated in the intercalibrated data. It is demonstrated that the use of Quantile-Quantile (“Q Q”) plots to test for a normal distribution is a useful indicator of erroneous and misleading regression fits. Ordinary least squares linear fits, not forced to pass through the origin, are sometimes reliable (although the optimum method used is shown to be different when matching peak and average sunspot group numbers). However, other fits are only reliable if non-linear regression is used. From these results it is entirely possible that the inflation of solar cycle amplitudes in the backbone group sunspot number as one goes back in time, relative to related solar-terrestrial parameters, is entirely caused by the use of inappropriate and non-robust regression techniques to calibrate the sunspot data.
Resumo:
Statistical analysis of data is crucial in cephalometric investigations. There are certainly excellent examples of good statistical practice in the field, but some articles published worldwide have carried out inappropriate analyses. Objective: The purpose of this study was to show that when the double records of each patient are traced on the same occasion, a control chart for differences between readings needs to be drawn, and limits of agreement and coefficients of repeatability must be calculated. Material and methods: Data from a well-known paper in Orthodontics were used for showing common statistical practices in cephalometric investigations and for proposing a new technique of analysis. Results: A scatter plot of the two radiograph readings and the two model readings with the respective regression lines are shown. Also, a control chart for the mean of the differences between radiograph readings was obtained and a coefficient of repeatability was calculated. Conclusions: A standard error assuming that mean differences are zero, which is referred to in Orthodontics and Facial Orthopedics as the Dahlberg error, can be calculated only for estimating precision if accuracy is already proven. When double readings are collected, limits of agreement and coefficients of repeatability must be calculated. A graph with differences of readings should be presented and outliers discussed.
Resumo:
The aim of this study was to analyze the weight at birth (BW) and adjusted at 205 (W205), 365 (W365) and 550 (W55O) days in beef buffaloes from Brazil, using two approaches: parametric, by normal distribution, and non-parametric, by kernel function, and thus estimating the genetic, environmental and phenotypic correlation among traits. Information of 5,169 animals at birth (BW), 3,792 at 205 days (W205), 3.883 at 365 days (W365) and 1,524 at 550 days of age (W550) were used. The birth weight distribution presented an evident discrepancy in relation to the normal distribution. However, W205, W365 and W550 presented normal distributions. The birth weight presented weak genetic, environmental, and phenotypic associations with the other weight measurements. on the other hand, the weight traits at 205, 365, 550 days of age showed a high genetic correlation.
Resumo:
In situ megascale hydraulic diffusivities (D) of a confined loess aquifer were estimated at various scales (10 <= L <= 1500 m) by a finite difference model, and laboratory microscale diffusivities of a loess sample by empirical formulas. A scatter plot reveals that D fits to a single power function of L, providing that microscale diffusivities are assigned to L = 1 m and that differences in diffusivity observed between micro- and megascales are assigned to medium heterogeneity appraised by variations in the curvature and slope of natural hydraulic head waves propagating through the aquifer. Subsequently, a general power relationship between D and L is defined where the base and exponent terms stand for the aquifer storage capability under a confined regime of flow, for the microscale hydraulic conductivity and specific yield of loess, and for the changes in curvature and slope of hydraulic head waves relative to values defined at unit scale.[GRAPHICS]Editor Z.W. Kundzewicz
Resumo:
Synthetic Biology is a relatively new discipline, born at the beginning of the New Millennium, that brings the typical engineering approach (abstraction, modularity and standardization) to biotechnology. These principles aim to tame the extreme complexity of the various components and aid the construction of artificial biological systems with specific functions, usually by means of synthetic genetic circuits implemented in bacteria or simple eukaryotes like yeast. The cell becomes a programmable machine and its low-level programming language is made of strings of DNA. This work was performed in collaboration with researchers of the Department of Electrical Engineering of the University of Washington in Seattle and also with a student of the Corso di Laurea Magistrale in Ingegneria Biomedica at the University of Bologna: Marilisa Cortesi. During the collaboration I contributed to a Synthetic Biology project already started in the Klavins Laboratory. In particular, I modeled and subsequently simulated a synthetic genetic circuit that was ideated for the implementation of a multicelled behavior in a growing bacterial microcolony. In the first chapter the foundations of molecular biology are introduced: structure of the nucleic acids, transcription, translation and methods to regulate gene expression. An introduction to Synthetic Biology completes the section. In the second chapter is described the synthetic genetic circuit that was conceived to make spontaneously emerge, from an isogenic microcolony of bacteria, two different groups of cells, termed leaders and followers. The circuit exploits the intrinsic stochasticity of gene expression and intercellular communication via small molecules to break the symmetry in the phenotype of the microcolony. The four modules of the circuit (coin flipper, sender, receiver and follower) and their interactions are then illustrated. In the third chapter is derived the mathematical representation of the various components of the circuit and the several simplifying assumptions are made explicit. Transcription and translation are modeled as a single step and gene expression is function of the intracellular concentration of the various transcription factors that act on the different promoters of the circuit. A list of the various parameters and a justification for their value closes the chapter. In the fourth chapter are described the main characteristics of the gro simulation environment, developed by the Self Organizing Systems Laboratory of the University of Washington. Then, a sensitivity analysis performed to pinpoint the desirable characteristics of the various genetic components is detailed. The sensitivity analysis makes use of a cost function that is based on the fraction of cells in each one of the different possible states at the end of the simulation and the wanted outcome. Thanks to a particular kind of scatter plot, the parameters are ranked. Starting from an initial condition in which all the parameters assume their nominal value, the ranking suggest which parameter to tune in order to reach the goal. Obtaining a microcolony in which almost all the cells are in the follower state and only a few in the leader state seems to be the most difficult task. A small number of leader cells struggle to produce enough signal to turn the rest of the microcolony in the follower state. It is possible to obtain a microcolony in which the majority of cells are followers by increasing as much as possible the production of signal. Reaching the goal of a microcolony that is split in half between leaders and followers is comparatively easy. The best strategy seems to be increasing slightly the production of the enzyme. To end up with a majority of leaders, instead, it is advisable to increase the basal expression of the coin flipper module. At the end of the chapter, a possible future application of the leader election circuit, the spontaneous formation of spatial patterns in a microcolony, is modeled with the finite state machine formalism. The gro simulations provide insights into the genetic components that are needed to implement the behavior. In particular, since both the examples of pattern formation rely on a local version of Leader Election, a short-range communication system is essential. Moreover, new synthetic components that allow to reliably downregulate the growth rate in specific cells without side effects need to be developed. In the appendix are listed the gro code utilized to simulate the model of the circuit, a script in the Python programming language that was used to split the simulations on a Linux cluster and the Matlab code developed to analyze the data.
Resumo:
Efficient planning of soil conservation measures requires, first, to understand the impact of soil erosion on soil fertility with regard to local land cover classes; and second, to identify hot spots of soil erosion and bright spots of soil conservation in a spatially explicit manner. Soil organic carbon (SOC) is an important indicator of soil fertility. The aim of this study was to conduct a spatial assessment of erosion and its impact on SOC for specific land cover classes. Input data consisted of extensive ground truth, a digital elevation model and Landsat 7 imagery from two different seasons. Soil spectral reflectance readings were taken from soil samples in the laboratory and calibrated with results of SOC chemical analysis using regression tree modelling. The resulting model statistics for soil degradation assessments are promising (R2=0.71, RMSEV=0.32). Since the area includes rugged terrain and small agricultural plots, the decision tree models allowed mapping of land cover classes, soil erosion incidence and SOC content classes at an acceptable level of accuracy for preliminary studies. The various datasets were linked in the hot-bright spot matrix, which was developed to combine soil erosion incidence information and SOC content levels (for uniform land cover classes) in a scatter plot. The quarters of the plot show different stages of degradation, from well conserved land to hot spots of soil degradation. The approach helps to gain a better understanding of the impact of soil erosion on soil fertility and to identify hot and bright spots in a spatially explicit manner. The results show distinctly lower SOC content levels on large parts of the test areas, where annual crop cultivation was dominant in the 1990s and where cultivation has now been abandoned. On the other hand, there are strong indications that afforestations and fruit orchards established in the 1980s have been successful in conserving soil resources.
Resumo:
The solid-state-physics technique of electron spin resonance (ESR) has been employed in an exploratory study of marine limestones and impact-related deposits from Cretaceous-Tertiary (KT) boundary sites including Spain (Sopelana and Caravaca), New Jersey (Bass River), the U.S. Atlantic continental margin (Blake Nose, ODP Leg 171B/1049/A), and several locations in Belize and southern Mexico within -600 km of the Chicxulub crater. The ESR spectra of SO3(1-) (a radiation-induced point defect involving a sulfite ion substitutional for CO3(2-) which has trapped a positive charge) and Mn(2+) in calcite were singled out for analysis because they are unambiguously interpretable and relatively easy to record. ESR signal strengths of calcite-related SO3(1-) and Mn(2+) have been studied as functions of stratigraphic position in whole-rock samples across the KT boundary at Sopelana, Caravaca, and Blake Nose. At all three of these sites, anomalies in SO3(1-) and/or Mn(2+) intensities are noted at the KT boundary relative to the corresponding background levels in the rocks above and below. At Caravaca, the SO3(1-) background itself is found to be lower by a factor of 2.7 in the first 30,000 years of the Tertiary relative to its steady-state value in the last 15,000 years of the Cretaceous, indicating either an abrupt and quasi-permanent change in ocean chemistry (or temperature) or extinction of the marine biota primarily responsible for fixing sulfite in the late Cretaceous limestones. An exponential decrease in the Mn(2+) concentration per unit mass calcite, [Mn(2+)], as the KT boundary at Caravaca is approached from below (1/e characteristic length =1.4 cm) is interpreted as a result of post-impact leaching of the seafloor. Absolute ESR quantitative analyses of proximal impact deposits from Belize and southern Mexico group naturally into three distinct fields in a twodimensional [SO3(1-)]-versus-[Mn(2+)] scatter plot. These fields contain (I) limestone ejecta clasts, (II) accretionary lapilli, and (III) a variety of SO3(1-) -depleted/Mn(2+) enriched impact deposits. Data for the investigated non-impact-related Cretaceous and Tertiary marine limestones (Spain and Blake Nose) fall outside of these three fields. With reference to thes enon-impact deposits, fields I, II, and III can be respectively characterized as Mn(2+) -depleted, SO3(1-) -enhanced, and SO3(1-) -depleted. It is proposed that (1) field I represents calcites from the Yucatin Platform, and that the Mn(2+) -depleted signature can be used as an indicator of primary Chicxulub ejecta in deep marine environments and (2) field II represents calcites that include a component formed in the vapor plume, either from condensation in the presence of CO2/SO3(1-) -rich vapors, or reactions between CaO and CO2/SO3 rich vapors, and that this SO3(1-) -enhanced signature can be used as an indicator of impact vapor plume deposits. Given these two propositions, the ESR data for the Blake Nose deposits are ascribed to the presence of basal coarse calcitic Chicxulub ejecta clasts, while the finer components that are increasingly represented toward the top are interpreted to contain high- SO3(1-) calcite from the vapor plume. The apparently-undisturbed Bass River deposit may contain even higher concentrations of vapor-plume calcite. None of the three components included in field III appear to be represented at distal, deep marine KT-boundary sites; this field may include several types of impact-related deposits of diverse origins and diagenetic histories.
Resumo:
Dissertação de mest. em Aquacultura, Unidade de Ciências e Tecnologias dos Recursos Aquáticos, Univ. do Algarve, 1997