16 resultados para statistical spatial analysis
em Helda - Digital Repository of University of Helsinki
Resumo:
Determination of the environmental factors controlling earth surface processes and landform patterns is one of the central themes in physical geography. However, the identification of the main drivers of the geomorphological phenomena is often challenging. Novel spatial analysis and modelling methods could provide new insights into the process-environment relationships. The objective of this research was to map and quantitatively analyse the occurrence of cryogenic phenomena in subarctic Finland. More precisely, utilising a grid-based approach the distribution and abundance of periglacial landforms were modelled to identify important landscape scale environmental factors. The study was performed using a comprehensive empirical data set of periglacial landforms from an area of 600 km2 at a 25-ha resolution. The utilised statistical methods were generalized linear modelling (GLM) and hierarchical partitioning (HP). GLMs were used to produce distribution and abundance models and HP to reveal independently the most likely causal variables. The GLM models were assessed utilising statistical evaluation measures, prediction maps, field observations and the results of HP analyses. A total of 40 different landform types and subtypes were identified. Topographical, soil property and vegetation variables were the primary correlates for the occurrence and cover of active periglacial landforms on the landscape scale. In the model evaluation, most of the GLMs were shown to be robust although the explanation power, prediction ability as well as the selected explanatory variables varied between the models. The great potential of the combination of a spatial grid system, terrain data and novel statistical techniques to map the occurrence of periglacial landforms was demonstrated in this study. GLM proved to be a useful modelling framework for testing the shapes of the response functions and significances of the environmental variables and the HP method helped to make better deductions of the important factors of earth surface processes. Hence, the numerical approach presented in this study can be a useful addition to the current range of techniques available to researchers to map and monitor different geographical phenomena.
Resumo:
The main objective of this study is to evaluate selected geophysical, structural and topographic methods on regional, local, and tunnel and borehole scales, as indicators of the properties of fracture zones or fractures relevant to groundwater flow. Such information serves, for example, groundwater exploration and prediction of the risk of groundwater inflow in underground construction. This study aims to address how the features detected by these methods link to groundwater flow in qualitative and semi-quantitative terms and how well the methods reveal properties of fracturing affecting groundwater flow in the studied sites. The investigated areas are: (1) the Päijänne Tunnel for water-conveyance whose study serves as a verification of structures identified on regional and local scales; (2) the Oitti fuel spill site, to telescope across scales and compare geometries of structural assessment; and (3) Leppävirta, where fracturing and hydrogeological environment have been studied on the scale of a drilled well. The methods applied in this study include: the interpretation of lineaments from topographic data and their comparison with aeromagnetic data; the analysis of geological structures mapped in the Päijänne Tunnel; borehole video surveying; groundwater inflow measurements; groundwater level observations; and information on the tunnel s deterioration as demonstrated by block falls. The study combined geological and geotechnical information on relevant factors governing groundwater inflow into a tunnel and indicators of fracturing, as well as environmental datasets as overlays for spatial analysis using GIS. Geophysical borehole logging and fluid logging were used in Leppävirta to compare the responses of different methods to fracturing and other geological features on the scale of a drilled well. Results from some of the geophysical measurements of boreholes were affected by the large diameter (gamma radiation) or uneven surface (caliper) of these structures. However, different anomalies indicating more fractured upper part of the bedrock traversed by well HN4 in Leppävirta suggest that several methods can be used for detecting fracturing. Fracture trends appear to align similarly on different scales in the zone of the Päijänne Tunnel. For example, similarities of patterns were found between the regional magnetic trends, correlating with orientations of topographic lineaments interpreted as expressions of fracture zones. The same structural orientations as those of the larger structures on local or regional scales were observed in the tunnel, even though a match could not be made in every case. The size and orientation of the observation space (patch of terrain at the surface, tunnel section, or borehole), the characterization method, with its typical sensitivity, and the characteristics of the location, influence the identification of the fracture pattern. Through due consideration of the influence of the sampling geometry and by utilizing complementary fracture characterization methods in tandem, some of the complexities of the relationship between fracturing and groundwater flow can be addressed. The flow connections demonstrated by the response of the groundwater level in monitoring wells to pressure decrease in the tunnel and the transport of MTBE through fractures in bedrock in Oitti, highlight the importance of protecting the tunnel water from a risk of contamination. In general, the largest values of drawdown occurred in monitoring wells closest to the tunnel and/or close to the topographically interpreted fracture zones. It seems that, to some degree, the rate of inflow shows a positive correlation with the level of reinforcement, as both are connected with the fracturing in the bedrock. The following geological features increased the vulnerability of tunnel sections to pollution, especially when several factors affected the same locations: (1) fractured bedrock, particularly with associated groundwater inflow; (2) thin or permeable overburden above fractured rock; (3) a hydraulically conductive layer underneath the surface soil; and (4) a relatively thin bedrock roof above the tunnel. The observed anisotropy of the geological media should ideally be taken into account in the assessment of vulnerability of tunnel sections and eventually for directing protective measures.
Resumo:
In this study we explore the concurrent, combined use of three research methods, statistical corpus analysis and two psycholinguistic experiments (a forced-choice and an acceptability rating task), using verbal synonymy in Finnish as a case in point. In addition to supporting conclusions from earlier studies concerning the relationships between corpus-based and ex- perimental data (e. g., Featherston 2005), we show that each method adds to our understanding of the studied phenomenon, in a way which could not be achieved through any single method by itself. Most importantly, whereas relative rareness in a corpus is associated with dispreference in selection, such infrequency does not categorically always entail substantially lower acceptability. Furthermore, we show that forced-choice and acceptability rating tasks pertain to distinct linguistic processes, with category-wise in- commensurable scales of measurement, and should therefore be merged with caution, if at all.
Resumo:
FTIR-spektroskopia (Fourier-muunnosinfrapunaspektroskopia) on nopea analyysimenetelmä. Fourier-laitteissa interferometrin käyttäminen mahdollistaa koko infrapunataajuusalueen mittaamisen muutamassa sekunnissa. ATR-liitännäisellä varustetun FTIR-spektrometrin käyttö ei edellytä juuri näytteen valmistusta ja siksi menetelmä on käytössä myös helppo. ATR-liitännäinen mahdollistaa myös monien erilaisten näytteiden analysoinnin. Infrapunaspektrin mittaaminen onnistuu myös sellaisista näytteistä, joille perinteisiä näytteenvalmistusmenetelmiä ei voida käyttää. FTIR-spektroskopian avulla saatu tieto yhdistetään usein tilastollisiin monimuuttuja-analyyseihin. Klusterianalyysin avulla voidaan spektreistä saatu tieto ryhmitellä samanlaisuuteen perustuen. Hierarkkisessa klusterianalyysissa objektien välinen samanlaisuus määritetään laskemalla niiden välinen etäisyys. Pääkomponenttianalyysin avulla vähennetään datan ulotteisuutta ja luodaan uusia korreloimattomia pääkomponentteja. Pääkomponenttien tulee säilyttää mahdollisimman suuri määrä alkuperäisen datan variaatiosta. FTIR-spektroskopian ja monimuuttujamenetelmien sovellusmahdollisuuksia on tutkittu paljon. Elintarviketeollisuudessa sen soveltuvuutta esimerkiksi laadun valvontaan on tutkittu. Menetelmää on käytetty myös haihtuvien öljyjen kemiallisten koostumusten tunnistukseen sekä öljykasvien kemotyyppien havaitsemiseen. Tässä tutkimuksessa arvioitiin menetelmän käyttöä suoputken uutenäytteiden luokittelussa. Tutkimuksessa suoputken eri kasvinosien uutenäytteiden FTIR-spektrejä vertailtiin valikoiduista puhdasaineista mitattuihin FTIR-spektreihin. Puhdasaineiden FTIR-spektreistä tunnistettiin niiden tyypilliset absorptiovyöhykkeet. Furanokumariinien spektrien intensiivisten vyöhykkeiden aaltolukualueet valittiin monimuuttuja-analyyseihin. Monimuuttuja-analyysit tehtiin myös IR-spektrin sormenjälkialueelta aaltolukualueelta 1785-725 cm-1. Uutenäytteitä pyrittiin luokittelemaan niiden keräyspaikan ja kumariinipitoisuuden mukaan. Keräyspaikan mukaan ryhmittymistä oli havaittavissa, mikä selittyi vyöhykkeiden aaltolukualueiden mukaan tehdyissä analyyseissa pääosin kumariinipitoisuuksilla. Näissä analyyseissa uutenäytteet pääosin ryhmittyivät ja erottuivat kokonaiskumariinipitoisuuksien mukaan. Myös aaltolukualueen 1785-725 cm-1 analyyseissa havaittiin keräyspaikan mukaan ryhmittymistä, mitä kumariinipitoisuudet eivät kuitenkaan selittäneet. Näihin ryhmittymisiin vaikuttivat mahdollisesti muiden yhdisteiden samanlaiset pitoisuudet näytteissä. Analyyseissa käytettiin myös muita aaltolukualueita, mutta tulokset eivät juuri poikenneet aiemmista. 2. kertaluvun derivaattaspektrien monimuuttuja-analyysit sormenjälkialueelta eivät myöskään muuttaneet tuloksia havaittavasti. Jatkotutkimuksissa nyt käytettyä menetelmää on mahdollista edelleen kehittää esimerkiksi tutkimalla monimuuttuja-analyyseissa 2. kertaluvun derivaattaspektreistä suppeampia, tarkkaan valittuja aaltolukualueita.
Resumo:
Tiivistelmä ReferatAbstract Metabolomics is a rapidly growing research field that studies the response of biological systems to environmental factors, disease states and genetic modifications. It aims at measuring the complete set of endogenous metabolites, i.e. the metabolome, in a biological sample such as plasma or cells. Because metabolites are the intermediates and end products of biochemical reactions, metabolite compositions and metabolite levels in biological samples can provide a wealth of information on on-going processes in a living system. Due to the complexity of the metabolome, metabolomic analysis poses a challenge to analytical chemistry. Adequate sample preparation is critical to accurate and reproducible analysis, and the analytical techniques must have high resolution and sensitivity to allow detection of as many metabolites as possible. Furthermore, as the information contained in the metabolome is immense, the data set collected from metabolomic studies is very large. In order to extract the relevant information from such large data sets, efficient data processing and multivariate data analysis methods are needed. In the research presented in this thesis, metabolomics was used to study mechanisms of polymeric gene delivery to retinal pigment epithelial (RPE) cells. The aim of the study was to detect differences in metabolomic fingerprints between transfected cells and non-transfected controls, and thereafter to identify metabolites responsible for the discrimination. The plasmid pCMV-β was introduced into RPE cells using the vector polyethyleneimine (PEI). The samples were analyzed using high performance liquid chromatography (HPLC) and ultra performance liquid chromatography (UPLC) coupled to a triple quadrupole (QqQ) mass spectrometer (MS). The software MZmine was used for raw data processing and principal component analysis (PCA) was used in statistical data analysis. The results revealed differences in metabolomic fingerprints between transfected cells and non-transfected controls. However, reliable fingerprinting data could not be obtained because of low analysis repeatability. Therefore, no attempts were made to identify metabolites responsible for discrimination between sample groups. Repeatability and accuracy of analyses can be influenced by protocol optimization. However, in this study, optimization of analytical methods was hindered by the very small number of samples available for analysis. In conclusion, this study demonstrates that obtaining reliable fingerprinting data is technically demanding, and the protocols need to be thoroughly optimized in order to approach the goals of gaining information on mechanisms of gene delivery.
Resumo:
The study describes and analyzes Finland Swedes attitudes to modern-day linguistic influence, the relationship between informants explicitly reported views and the implicit attitudes they express towards language influence. The methods are primarily sociolinguistic. For the analysis of opinions and attitudes I have further developed and tested a new tool in attitude research. With statistical correlation analysis of data collected through a quantitative survey I describe the views that Swedish-language Finns (N=500) report on the influence of English, on imports, and on domain loss. With experimental matchedguise techniques, I study Finland-Swedes (N=600) subconscious reactions to English imports in spoken text. My results show that the subconscious reactions in some respects differ markedly from the views informants explicitly report that they have: informants respond that they would like English words that come into Swedish to be replaced by Swedish replacement words, but in a matched-guise test on their subconscious attitudes, the informants consider English words in a Swedish context to have a positive effect. The topic is further dealt with in interviews where I examine 36 informants implicit attitudes through interactional sociolinguistic analyses. This study comes close to pragmatic discourse analysis in its focus on pragmatic particles and modality. The study makes a rather strict distinction between explicitly expressed opinions and implicit, subconscious attitudes. The quantitative analyses suggest that the opinions we express can be tied to the explicit in language. The outcome of the matched-guise test shows that it is furthermore possible to find subconscious, implicit attitudes that people in actual situations rely on when they make decisions. The discourse analysis finds many subconscious signals, but it also shows that the signals arise in interaction with one s interlocutor, the situation, and the norms in the society. To account for this I have introduced the concept of socioconscious attitude. Socioconscious attitudes reflect not only the traditions and values the utterer grew up with, but also the speaker s relation to the social situation (s)he takes part in.
Resumo:
The aim of this study is to find out how urban segregation is connected to the differentiation in educational outcomes in public schools. The connection between urban structure and educational outcomes is studied on both the primary and secondary school level. The secondary purpose of this study is to find out whether the free school choice policy introduced in the mid-1990´s has an effect on the educational outcomes in secondary schools or on the observed relationship between the urban structure and educational outcomes. The study is quantitative in nature, and the most important method used is statistical regression analysis. The educational outcome data ranging the years from 1999 to 2002 has been provided by the Finnish National Board of Education, and the data containing variables describing the social and physical structure of Helsinki has been provided by Statistics Finland and City of Helsinki Urban Facts. The central observation is that there is a clear connection between urban segregation and differences in educational outcomes in public schools. With variables describing urban structure, it is possible to statistically explain up to 70 % of the variation in educational outcomes in the primary schools and 60 % of the variation in educational oucomes in the secondary schools. The most significant variables in relation to low educational outcomes in Helsinki are abundance of public housing, low educational status of the adult population and high numbers of immigrants in the school's catchment area. The regression model has been constructed using these variables. The lower coefficient of determination in the educational outcomes of secondary schools is mostly due to the effects of secondary school choice. Studying the public school market revealed that students selecting a secondary school outside their local catchment area cause an increase in the variation of the educational outcomes between secondary schools. When the number of students selecting a school outside their local catchment area is taken into account in the regressional model, it is possible to explain up to 80 % of the variation in educational outcomes in the secondary schools in Helsinki.
Resumo:
In genetic epidemiology, population-based disease registries are commonly used to collect genotype or other risk factor information concerning affected subjects and their relatives. This work presents two new approaches for the statistical inference of ascertained data: a conditional and full likelihood approaches for the disease with variable age at onset phenotype using familial data obtained from population-based registry of incident cases. The aim is to obtain statistically reliable estimates of the general population parameters. The statistical analysis of familial data with variable age at onset becomes more complicated when some of the study subjects are non-susceptible, that is to say these subjects never get the disease. A statistical model for a variable age at onset with long-term survivors is proposed for studies of familial aggregation, using latent variable approach, as well as for prospective studies of genetic association studies with candidate genes. In addition, we explore the possibility of a genetic explanation of the observed increase in the incidence of Type 1 diabetes (T1D) in Finland in recent decades and the hypothesis of non-Mendelian transmission of T1D associated genes. Both classical and Bayesian statistical inference were used in the modelling and estimation. Despite the fact that this work contains five studies with different statistical models, they all concern data obtained from nationwide registries of T1D and genetics of T1D. In the analyses of T1D data, non-Mendelian transmission of T1D susceptibility alleles was not observed. In addition, non-Mendelian transmission of T1D susceptibility genes did not make a plausible explanation for the increase in T1D incidence in Finland. Instead, the Human Leucocyte Antigen associations with T1D were confirmed in the population-based analysis, which combines T1D registry information, reference sample of healthy subjects and birth cohort information of the Finnish population. Finally, a substantial familial variation in the susceptibility of T1D nephropathy was observed. The presented studies show the benefits of sophisticated statistical modelling to explore risk factors for complex diseases.
Resumo:
Digital elevation models (DEMs) have been an important topic in geography and surveying sciences for decades due to their geomorphological importance as the reference surface for gravita-tion-driven material flow, as well as the wide range of uses and applications. When DEM is used in terrain analysis, for example in automatic drainage basin delineation, errors of the model collect in the analysis results. Investigation of this phenomenon is known as error propagation analysis, which has a direct influence on the decision-making process based on interpretations and applications of terrain analysis. Additionally, it may have an indirect influence on data acquisition and the DEM generation. The focus of the thesis was on the fine toposcale DEMs, which are typically represented in a 5-50m grid and used in the application scale 1:10 000-1:50 000. The thesis presents a three-step framework for investigating error propagation in DEM-based terrain analysis. The framework includes methods for visualising the morphological gross errors of DEMs, exploring the statistical and spatial characteristics of the DEM error, making analytical and simulation-based error propagation analysis and interpreting the error propagation analysis results. The DEM error model was built using geostatistical methods. The results show that appropriate and exhaustive reporting of various aspects of fine toposcale DEM error is a complex task. This is due to the high number of outliers in the error distribution and morphological gross errors, which are detectable with presented visualisation methods. In ad-dition, the use of global characterisation of DEM error is a gross generalisation of reality due to the small extent of the areas in which the decision of stationarity is not violated. This was shown using exhaustive high-quality reference DEM based on airborne laser scanning and local semivariogram analysis. The error propagation analysis revealed that, as expected, an increase in the DEM vertical error will increase the error in surface derivatives. However, contrary to expectations, the spatial au-tocorrelation of the model appears to have varying effects on the error propagation analysis depend-ing on the application. The use of a spatially uncorrelated DEM error model has been considered as a 'worst-case scenario', but this opinion is now challenged because none of the DEM derivatives investigated in the study had maximum variation with spatially uncorrelated random error. Sig-nificant performance improvement was achieved in simulation-based error propagation analysis by applying process convolution in generating realisations of the DEM error model. In addition, typology of uncertainty in drainage basin delineations is presented.
Resumo:
This thesis presents novel modelling applications for environmental geospatial data using remote sensing, GIS and statistical modelling techniques. The studied themes can be classified into four main themes: (i) to develop advanced geospatial databases. Paper (I) demonstrates the creation of a geospatial database for the Glanville fritillary butterfly (Melitaea cinxia) in the Åland Islands, south-western Finland; (ii) to analyse species diversity and distribution using GIS techniques. Paper (II) presents a diversity and geographical distribution analysis for Scopulini moths at a world-wide scale; (iii) to study spatiotemporal forest cover change. Paper (III) presents a study of exotic and indigenous tree cover change detection in Taita Hills Kenya using airborne imagery and GIS analysis techniques; (iv) to explore predictive modelling techniques using geospatial data. In Paper (IV) human population occurrence and abundance in the Taita Hills highlands was predicted using the generalized additive modelling (GAM) technique. Paper (V) presents techniques to enhance fire prediction and burned area estimation at a regional scale in East Caprivi Namibia. Paper (VI) compares eight state-of-the-art predictive modelling methods to improve fire prediction, burned area estimation and fire risk mapping in East Caprivi Namibia. The results in Paper (I) showed that geospatial data can be managed effectively using advanced relational database management systems. Metapopulation data for Melitaea cinxia butterfly was successfully combined with GPS-delimited habitat patch information and climatic data. Using the geospatial database, spatial analyses were successfully conducted at habitat patch level or at more coarse analysis scales. Moreover, this study showed it appears evident that at a large-scale spatially correlated weather conditions are one of the primary causes of spatially correlated changes in Melitaea cinxia population sizes. In Paper (II) spatiotemporal characteristics of Socupulini moths description, diversity and distribution were analysed at a world-wide scale and for the first time GIS techniques were used for Scopulini moth geographical distribution analysis. This study revealed that Scopulini moths have a cosmopolitan distribution. The majority of the species have been described from the low latitudes, sub-Saharan Africa being the hot spot of species diversity. However, the taxonomical effort has been uneven among biogeographical regions. Paper III showed that forest cover change can be analysed in great detail using modern airborne imagery techniques and historical aerial photographs. However, when spatiotemporal forest cover change is studied care has to be taken in co-registration and image interpretation when historical black and white aerial photography is used. In Paper (IV) human population distribution and abundance could be modelled with fairly good results using geospatial predictors and non-Gaussian predictive modelling techniques. Moreover, land cover layer is not necessary needed as a predictor because first and second-order image texture measurements derived from satellite imagery had more power to explain the variation in dwelling unit occurrence and abundance. Paper V showed that generalized linear model (GLM) is a suitable technique for fire occurrence prediction and for burned area estimation. GLM based burned area estimations were found to be more superior than the existing MODIS burned area product (MCD45A1). However, spatial autocorrelation of fires has to be taken into account when using the GLM technique for fire occurrence prediction. Paper VI showed that novel statistical predictive modelling techniques can be used to improve fire prediction, burned area estimation and fire risk mapping at a regional scale. However, some noticeable variation between different predictive modelling techniques for fire occurrence prediction and burned area estimation existed.
Resumo:
Bacteria play an important role in many ecological systems. The molecular characterization of bacteria using either cultivation-dependent or cultivation-independent methods reveals the large scale of bacterial diversity in natural communities, and the vastness of subpopulations within a species or genus. Understanding how bacterial diversity varies across different environments and also within populations should provide insights into many important questions of bacterial evolution and population dynamics. This thesis presents novel statistical methods for analyzing bacterial diversity using widely employed molecular fingerprinting techniques. The first objective of this thesis was to develop Bayesian clustering models to identify bacterial population structures. Bacterial isolates were identified using multilous sequence typing (MLST), and Bayesian clustering models were used to explore the evolutionary relationships among isolates. Our method involves the inference of genetic population structures via an unsupervised clustering framework where the dependence between loci is represented using graphical models. The population dynamics that generate such a population stratification were investigated using a stochastic model, in which homologous recombination between subpopulations can be quantified within a gene flow network. The second part of the thesis focuses on cluster analysis of community compositional data produced by two different cultivation-independent analyses: terminal restriction fragment length polymorphism (T-RFLP) analysis, and fatty acid methyl ester (FAME) analysis. The cluster analysis aims to group bacterial communities that are similar in composition, which is an important step for understanding the overall influences of environmental and ecological perturbations on bacterial diversity. A common feature of T-RFLP and FAME data is zero-inflation, which indicates that the observation of a zero value is much more frequent than would be expected, for example, from a Poisson distribution in the discrete case, or a Gaussian distribution in the continuous case. We provided two strategies for modeling zero-inflation in the clustering framework, which were validated by both synthetic and empirical complex data sets. We show in the thesis that our model that takes into account dependencies between loci in MLST data can produce better clustering results than those methods which assume independent loci. Furthermore, computer algorithms that are efficient in analyzing large scale data were adopted for meeting the increasing computational need. Our method that detects homologous recombination in subpopulations may provide a theoretical criterion for defining bacterial species. The clustering of bacterial community data include T-RFLP and FAME provides an initial effort for discovering the evolutionary dynamics that structure and maintain bacterial diversity in the natural environment.
Resumo:
In this Thesis, we develop theory and methods for computational data analysis. The problems in data analysis are approached from three perspectives: statistical learning theory, the Bayesian framework, and the information-theoretic minimum description length (MDL) principle. Contributions in statistical learning theory address the possibility of generalization to unseen cases, and regression analysis with partially observed data with an application to mobile device positioning. In the second part of the Thesis, we discuss so called Bayesian network classifiers, and show that they are closely related to logistic regression models. In the final part, we apply the MDL principle to tracing the history of old manuscripts, and to noise reduction in digital signals.
Resumo:
In meteorology, observations and forecasts of a wide range of phenomena for example, snow, clouds, hail, fog, and tornados can be categorical, that is, they can only have discrete values (e.g., "snow" and "no snow"). Concentrating on satellite-based snow and cloud analyses, this thesis explores methods that have been developed for evaluation of categorical products and analyses. Different algorithms for satellite products generate different results; sometimes the differences are subtle, sometimes all too visible. In addition to differences between algorithms, the satellite products are influenced by physical processes and conditions, such as diurnal and seasonal variation in solar radiation, topography, and land use. The analysis of satellite-based snow cover analyses from NOAA, NASA, and EUMETSAT, and snow analyses for numerical weather prediction models from FMI and ECMWF was complicated by the fact that we did not have the true knowledge of snow extent, and we were forced simply to measure the agreement between different products. The Sammon mapping, a multidimensional scaling method, was then used to visualize the differences between different products. The trustworthiness of the results for cloud analyses [EUMETSAT Meteorological Products Extraction Facility cloud mask (MPEF), together with the Nowcasting Satellite Application Facility (SAFNWC) cloud masks provided by Météo-France (SAFNWC/MSG) and the Swedish Meteorological and Hydrological Institute (SAFNWC/PPS)] compared with ceilometers of the Helsinki Testbed was estimated by constructing confidence intervals (CIs). Bootstrapping, a statistical resampling method, was used to construct CIs, especially in the presence of spatial and temporal correlation. The reference data for validation are constantly in short supply. In general, the needs of a particular project drive the requirements for evaluation, for example, for the accuracy and the timeliness of the particular data and methods. In this vein, we discuss tentatively how data provided by general public, e.g., photos shared on the Internet photo-sharing service Flickr, can be used as a new source for validation. Results show that they are of reasonable quality and their use for case studies can be warmly recommended. Last, the use of cluster analysis on meteorological in-situ measurements was explored. The Autoclass algorithm was used to construct compact representations of synoptic conditions of fog at Finnish airports.