970 resultados para Datasets
Resumo:
In this paper both documentary and natural proxy data have been used to improve the accuracy of palaeoclimatic knowledge in Finland since the 18th century. Early meteorological observations from Turku (1748-1800) were analyzed first as a potential source of climate variability. The reliability of the calculated mean temperatures was evaluated by comparing them with those of contemporary temperature records from Stockholm, St. Petersburg and Uppsala. The resulting monthly, seasonal and yearly mean temperatures from 1748 to 1800 were compared with the present day mean values (1961-1990): the comparison suggests that the winters of the period 1749-1800 were 0.8 ºC colder than today, while the summers were 0.4 ºC warmer. Over the same period, springs were 0.9 ºC and autumns 0.1 ºC colder than today. Despite their uncertainties when compared with modern meteorological data, early temperature measurements offer direct and daily information about the weather for all months of the year, in contrast with other proxies. Secondly, early meteorological observations from Tornio (1737-1749) and Ylitornio (1792-1838) were used to study the temporal behaviour of the climate-tree growth relationship during the past three centuries in northern Finland. Analyses showed that the correlations between ring widths and mid-summer (July) temperatures did not vary significantly as a function of time. Early (June) and late summer (August) mean temperatures were secondary to mid-summer temperatures in controlling the radial growth. According the dataset used, there was no clear signature of temporally reduced sensitivity of Scots pine ring widths to mid-summer temperatures over the periods of early and modern meteorological observations. Thirdly, plant phenological data with tree-rings from south-west Finland since 1750 were examined as a palaeoclimate indicator. The information from the fragmentary, partly overlapping, partly nonsystematically biased plant phenological records of 14 different phenomena were combined into one continuous time series of phenological indices. The indices were found to be reliable indicators of the February to June temperature variations. In contrast, there was no correlation between the phenological indices and the precipitation data. Moreover, the correlations between the studied tree-rings and spring temperatures varied as a function of time and hence, their use in palaeoclimate reconstruction is questionable. The use of present tree-ring datasets for palaeoclimate purposes may become possible after the application of more sophisticated calibration methods. Climate variability since the 18th century is perhaps best seen in the fourth paper study of the multiproxy spring temperature reconstruction of south-west Finland. With the help of transfer functions, an attempt has been made to utilize both documentary and natural proxies. The reconstruction was verified with statistics showing a high degree of validity between the reconstructed and observed temperatures. According to the proxies and modern meteorological observations from Turku, springs have become warmer and have featured a warming trend since around the 1850s. Over the period of 1750 to around 1850, springs featured larger multidecadal low-frequency variability, as well as a smaller range of annual temperature variations. The coldest springtimes occurred around the 1840s and 1850s and the first decade of the 19th century. Particularly warm periods occurred in the 1760s, 1790s, 1820s, 1930s, 1970s and from 1987 onwards, although in this period cold springs occurred, such as the springs of 1994 and 1996. On the basis of the available material, long-term temperature changes have been related to changes in the atmospheric circulation, such as the North Atlantic Oscillation (February-June).
Resumo:
Contamination of urban streams is a rising topic worldwide, but the assessment and investigation of stormwater induced contamination is limited by the high amount of water quality data needed to obtain reliable results. In this study, stream bed sediments were studied to determine their contamination degree and their applicability in monitoring aquatic metal contamination in urban areas. The interpretation of sedimentary metal concentrations is, however, not straightforward, since the concentrations commonly show spatial and temporal variations as a response to natural processes. The variations of and controls on metal concentrations were examined at different scales to increase the understanding of the usefulness of sediment metal concentrations in detecting anthropogenic metal contamination patterns. The acid extractable concentrations of Zn, Cu, Pb and Cd were determined from the surface sediments and water of small streams in the Helsinki Metropolitan region, southern Finland. The data consists of two datasets: sediment samples from 53 sites located in the catchment of the Stream Gräsanoja and sediment and water samples from 67 independent catchments scattered around the metropolitan region. Moreover, the sediment samples were analyzed for their physical and chemical composition (e.g. total organic carbon, clay-%, Al, Li, Fe, Mn) and the speciation of metals (in the dataset of the Stream Gräsanoja). The metal concentrations revealed that the stream sediments were moderately contaminated and caused no immediate threat to the biota. However, at some sites the sediments appeared to be polluted with Cu or Zn. The metal concentrations increased with increasing intensity of urbanization, but site specific factors, such as point sources, were responsible for the occurrence of the highest metal concentrations. The sediment analyses revealed, thus a need for more detailed studies on the processes and factors that cause the hot spot metal concentrations. The sediment composition and metal speciation analyses indicated that organic matter is a very strong indirect control on metal concentrations, and it should be accounted for when studying anthropogenic metal contamination patterns. The fine-scale spatial and temporal variations of metal concentrations were low enough to allow meaningful interpretation of substantial metal concentration differences between sites. Furthermore, the metal concentrations in the stream bed sediments were correlated with the urbanization of the catchment better than the total metal concentrations in the water phase. These results suggest that stream sediments show true potential for wider use in detecting the spatial differences in metal contamination of urban streams. Consequently, using the sediment approach regional estimates of the stormwater related metal contamination could be obtained fairly cost-effectively, and the stability and reliability of results would be higher compared to analyses of single water samples. Nevertheless, water samples are essential in analysing the dissolved concentrations of metals, momentary discharges from point sources in particular.
Resumo:
The main objective of this study is to evaluate selected geophysical, structural and topographic methods on regional, local, and tunnel and borehole scales, as indicators of the properties of fracture zones or fractures relevant to groundwater flow. Such information serves, for example, groundwater exploration and prediction of the risk of groundwater inflow in underground construction. This study aims to address how the features detected by these methods link to groundwater flow in qualitative and semi-quantitative terms and how well the methods reveal properties of fracturing affecting groundwater flow in the studied sites. The investigated areas are: (1) the Päijänne Tunnel for water-conveyance whose study serves as a verification of structures identified on regional and local scales; (2) the Oitti fuel spill site, to telescope across scales and compare geometries of structural assessment; and (3) Leppävirta, where fracturing and hydrogeological environment have been studied on the scale of a drilled well. The methods applied in this study include: the interpretation of lineaments from topographic data and their comparison with aeromagnetic data; the analysis of geological structures mapped in the Päijänne Tunnel; borehole video surveying; groundwater inflow measurements; groundwater level observations; and information on the tunnel s deterioration as demonstrated by block falls. The study combined geological and geotechnical information on relevant factors governing groundwater inflow into a tunnel and indicators of fracturing, as well as environmental datasets as overlays for spatial analysis using GIS. Geophysical borehole logging and fluid logging were used in Leppävirta to compare the responses of different methods to fracturing and other geological features on the scale of a drilled well. Results from some of the geophysical measurements of boreholes were affected by the large diameter (gamma radiation) or uneven surface (caliper) of these structures. However, different anomalies indicating more fractured upper part of the bedrock traversed by well HN4 in Leppävirta suggest that several methods can be used for detecting fracturing. Fracture trends appear to align similarly on different scales in the zone of the Päijänne Tunnel. For example, similarities of patterns were found between the regional magnetic trends, correlating with orientations of topographic lineaments interpreted as expressions of fracture zones. The same structural orientations as those of the larger structures on local or regional scales were observed in the tunnel, even though a match could not be made in every case. The size and orientation of the observation space (patch of terrain at the surface, tunnel section, or borehole), the characterization method, with its typical sensitivity, and the characteristics of the location, influence the identification of the fracture pattern. Through due consideration of the influence of the sampling geometry and by utilizing complementary fracture characterization methods in tandem, some of the complexities of the relationship between fracturing and groundwater flow can be addressed. The flow connections demonstrated by the response of the groundwater level in monitoring wells to pressure decrease in the tunnel and the transport of MTBE through fractures in bedrock in Oitti, highlight the importance of protecting the tunnel water from a risk of contamination. In general, the largest values of drawdown occurred in monitoring wells closest to the tunnel and/or close to the topographically interpreted fracture zones. It seems that, to some degree, the rate of inflow shows a positive correlation with the level of reinforcement, as both are connected with the fracturing in the bedrock. The following geological features increased the vulnerability of tunnel sections to pollution, especially when several factors affected the same locations: (1) fractured bedrock, particularly with associated groundwater inflow; (2) thin or permeable overburden above fractured rock; (3) a hydraulically conductive layer underneath the surface soil; and (4) a relatively thin bedrock roof above the tunnel. The observed anisotropy of the geological media should ideally be taken into account in the assessment of vulnerability of tunnel sections and eventually for directing protective measures.
Resumo:
Background and Aim The etiology of Crohn's disease (CD) implicates both genetic and environmental factors. Smoking behavior is one environmental risk factor to play a role in the development of CD. The study aimed to assess the contribution of the interleukin 23 receptor (IL23R) in determining disease susceptibility in two independent cohorts of CD, and to investigate the interactions between IL23R variants, smoking behavior, and CD-associated genes, NOD2 and ATG16L1. Methods Ten IL23R single-nucleotide polymorphisms (SNPs) were genotyped in 675 CD cases, and 1255 controls from Brisbane, Australia (dataset 1). Six of these SNPs were genotyped in 318 CD cases and 533 controls from Canterbury, New Zealand (dataset 2). Case–control analysis of genotype and allele frequencies, and haplotype analysis for all SNPs was conducted. Results We demonstrate a strong increased CD risk for smokers in both datasets (odds ratio 3.77, 95% confidence interval 2.88–4.94), and an additive interaction between IL23R SNPs and cigarette smoking. Ileal involvement was a consistent marker of strong SNP–CD association (P ≤ 0.001), while the lowest minor allele frequencies for location were found in those with colonic CD (L2). Three haplotype blocks were identified across the 10 IL23R SNPs conferring different risk of CD. Haplotypes conferred no further risk of CD when compared with single SNP analyses. Conclusion IL23R gene variants determine CD susceptibility in the Australian and New Zealand population, particularly ileal CD. A strong additive interaction exists between IL23R SNPs and smoking behavior resulting in a dramatic increase in disease risk depending upon specific genetic background.
Resumo:
There is an increasing requirement for more astute land resource management through efficiencies in agricultural inputs in a sugar cane production system. A precision agriculture (PA) approach can provide a pathway for a sustainable sugarcane production system. One of the impediments to the adoption of PA practices is access to paddock-scale mapping layers displaying variability in soil properties, crop growth and surface drainage. Variable rate application (VRA) of nutrients is an important component of PA. However, agronomic expertise within PA systems has fallen well behind significant advances in PA technologies. Generally, advisers in the sugar industry have a poor comprehension of the complex interaction of variables that contribute to within-paddock variations in crop growth. This is regarded as a significant impediment to the progression of PA in sugarcane and is one of the reasons for the poor adoption of VRA of nutrients in a PA approach to improved sugar cane production. This project therefore has established a number of key objectives which will contribute to the adoption of PA and the staged progression of VRA supported by relevant and practical agronomic expertise. These objectives include provision of base soils attribute mapping that can be determined using Veris 3100 Electrical Conductivity (EC) and digital elevation datasets using GPS mapping technology for a large sector of the central cane growing region using analysis of archived satellite imagery to determine the location and stability of yield patterns over time and in varying seasonal conditions on selected project study sites. They also include the stablishment of experiments to determine appropriate VRA nitrogen rates on various soil types subjected to extended anaerobic conditions, and the establishment of trials to determine nitrogen rates applicable to a declining yield potential associated with the aging of ratoons in the crop cycle. Preliminary analysis of archived yield estimation data indicates that yield patterns remain relatively stable overtime. Results also indicate the where there is considerable variability in EC values there is also significant variation in yield.
Resumo:
User generated information such as product reviews have been booming due to the advent of web 2.0. In particular, rich information associated with reviewed products has been buried in such big data. In order to facilitate identifying useful information from product (e.g., cameras) reviews, opinion mining has been proposed and widely used in recent years. In detail, as the most critical step of opinion mining, feature extraction aims to extract significant product features from review texts. However, most existing approaches only find individual features rather than identifying the hierarchical relationships between the product features. In this paper, we propose an approach which finds both features and feature relationships, structured as a feature hierarchy which is referred to as feature taxonomy in the remainder of the paper. Specifically, by making use of frequent patterns and association rules, we construct the feature taxonomy to profile the product at multiple levels instead of single level, which provides more detailed information about the product. The experiment which has been conducted based upon some real world review datasets shows that our proposed method is capable of identifying product features and relations effectively.
Resumo:
This paper presents an effective feature representation method in the context of activity recognition. Efficient and effective feature representation plays a crucial role not only in activity recognition, but also in a wide range of applications such as motion analysis, tracking, 3D scene understanding etc. In the context of activity recognition, local features are increasingly popular for representing videos because of their simplicity and efficiency. While they achieve state-of-the-art performance with low computational requirements, their performance is still limited for real world applications due to a lack of contextual information and models not being tailored to specific activities. We propose a new activity representation framework to address the shortcomings of the popular, but simple bag-of-words approach. In our framework, first multiple instance SVM (mi-SVM) is used to identify positive features for each action category and the k-means algorithm is used to generate a codebook. Then locality-constrained linear coding is used to encode the features into the generated codebook, followed by spatio-temporal pyramid pooling to convey the spatio-temporal statistics. Finally, an SVM is used to classify the videos. Experiments carried out on two popular datasets with varying complexity demonstrate significant performance improvement over the base-line bag-of-feature method.
Resumo:
An integrated approach of using strandings and bycatch data may provide an indicator of long-term trends for data-limited cetaceans. Strandings programs can give a faithful representation of the species composition of cetacean assemblages, while standardised bycatch rates can provide a measure of relative abundance. Comparing the two datasets may also facilitate managing impacts by understanding which species, sex or sizes are the most vulnerable to interactions with fisheries gear. Here we apply this approach to two long-term datasets in East Australia, bycatch in the Queensland Shark Control Program QSCP, 1992–2012) and strandings in the Queensland Marine Wildlife Strandings and Mortality Program StrandNet, 1996–2012). Short-beaked common dolphins, Delphinus delphis, were markedly more frequent in bycatch than in the strandings dataset, suggesting that they are more prone to being incidentally caught than other cetacean species in the region. The reverse was true for humpback whales, Megaptera novaeangliae, bottlenose dolphins, Tursiops spp.; and species predominantly found in offshore waters. QSCP bycatch was strongly skewed towards females for short-beaked common dolphins, and towards smaller sizes for Australian humpback dolphins, Sousa sahulensis. Overall, both datasets demonstrated similar seasonality and a similar long-term increase from 1996 until 2008. Analysis on a species-by-species basis was then used to explore potential explanations for long-term trends, which ranged from a recovering stock (humpback whales) to a shift in habitat use (short-beaked common dolphins).
Resumo:
In this thesis we present and evaluate two pattern matching based methods for answer extraction in textual question answering systems. A textual question answering system is a system that seeks answers to natural language questions from unstructured text. Textual question answering systems are an important research problem because as the amount of natural language text in digital format grows all the time, the need for novel methods for pinpointing important knowledge from the vast textual databases becomes more and more urgent. We concentrate on developing methods for the automatic creation of answer extraction patterns. A new type of extraction pattern is developed also. The pattern matching based approach chosen is interesting because of its language and application independence. The answer extraction methods are developed in the framework of our own question answering system. Publicly available datasets in English are used as training and evaluation data for the methods. The techniques developed are based on the well known methods of sequence alignment and hierarchical clustering. The similarity metric used is based on edit distance. The main conclusions of the research are that answer extraction patterns consisting of the most important words of the question and of the following information extracted from the answer context: plain words, part-of-speech tags, punctuation marks and capitalization patterns, can be used in the answer extraction module of a question answering system. This type of patterns and the two new methods for generating answer extraction patterns provide average results when compared to those produced by other systems using the same dataset. However, most answer extraction methods in the question answering systems tested with the same dataset are both hand crafted and based on a system-specific and fine-grained question classification. The the new methods developed in this thesis require no manual creation of answer extraction patterns. As a source of knowledge, they require a dataset of sample questions and answers, as well as a set of text documents that contain answers to most of the questions. The question classification used in the training data is a standard one and provided already in the publicly available data.
Resumo:
This thesis studies human gene expression space using high throughput gene expression data from DNA microarrays. In molecular biology, high throughput techniques allow numerical measurements of expression of tens of thousands of genes simultaneously. In a single study, this data is traditionally obtained from a limited number of sample types with a small number of replicates. For organism-wide analysis, this data has been largely unavailable and the global structure of human transcriptome has remained unknown. This thesis introduces a human transcriptome map of different biological entities and analysis of its general structure. The map is constructed from gene expression data from the two largest public microarray data repositories, GEO and ArrayExpress. The creation of this map contributed to the development of ArrayExpress by identifying and retrofitting the previously unusable and missing data and by improving the access to its data. It also contributed to creation of several new tools for microarray data manipulation and establishment of data exchange between GEO and ArrayExpress. The data integration for the global map required creation of a new large ontology of human cell types, disease states, organism parts and cell lines. The ontology was used in a new text mining and decision tree based method for automatic conversion of human readable free text microarray data annotations into categorised format. The data comparability and minimisation of the systematic measurement errors that are characteristic to each lab- oratory in this large cross-laboratories integrated dataset, was ensured by computation of a range of microarray data quality metrics and exclusion of incomparable data. The structure of a global map of human gene expression was then explored by principal component analysis and hierarchical clustering using heuristics and help from another purpose built sample ontology. A preface and motivation to the construction and analysis of a global map of human gene expression is given by analysis of two microarray datasets of human malignant melanoma. The analysis of these sets incorporate indirect comparison of statistical methods for finding differentially expressed genes and point to the need to study gene expression on a global level.
Resumo:
Place identification refers to the process of analyzing sensor data in order to detect places, i.e., spatial areas that are linked with activities and associated with meanings. Place information can be used, e.g., to provide awareness cues in applications that support social interactions, to provide personalized and location-sensitive information to the user, and to support mobile user studies by providing cues about the situations the study participant has encountered. Regularities in human movement patterns make it possible to detect personally meaningful places by analyzing location traces of a user. This thesis focuses on providing system level support for place identification, as well as on algorithmic issues related to the place identification process. The move from location to place requires interactions between location sensing technologies (e.g., GPS or GSM positioning), algorithms that identify places from location data and applications and services that utilize place information. These interactions can be facilitated using a mobile platform, i.e., an application or framework that runs on a mobile phone. For the purposes of this thesis, mobile platforms automate data capture and processing and provide means for disseminating data to applications and other system components. The first contribution of the thesis is BeTelGeuse, a freely available, open source mobile platform that supports multiple runtime environments. The actual place identification process can be understood as a data analysis task where the goal is to analyze (location) measurements and to identify areas that are meaningful to the user. The second contribution of the thesis is the Dirichlet Process Clustering (DPCluster) algorithm, a novel place identification algorithm. The performance of the DPCluster algorithm is evaluated using twelve different datasets that have been collected by different users, at different locations and over different periods of time. As part of the evaluation we compare the DPCluster algorithm against other state-of-the-art place identification algorithms. The results indicate that the DPCluster algorithm provides improved generalization performance against spatial and temporal variations in location measurements.
Resumo:
Web data can often be represented in free tree form; however, free tree mining methods seldom exist. In this paper, a computationally fast algorithm FreeS is presented to discover all frequently occurring free subtrees in a database of labelled free trees. FreeS is designed using an optimal canonical form, BOCF that can uniquely represent free trees even during the presence of isomorphism. To avoid enumeration of false positive candidates, it utilises the enumeration approach based on a tree-structure guided scheme. This paper presents lemmas that introduce conditions to conform the generation of free tree candidates during enumeration. Empirical study using both real and synthetic datasets shows that FreeS is scalable and significantly outperforms (i.e. few orders of magnitude faster than) the state-of-the-art frequent free tree mining algorithms, HybridTreeMiner and FreeTreeMiner.
Resumo:
An integrated approach of using strandings and bycatch data may provide an indicator of long-term trends for data-limited cetaceans. Strandings programs can give a faithful representation of the species composition of cetacean assemblages, while standardised bycatch rates can provide a measure of relative abundance. Comparing the two datasets may also facilitate managing impacts by understanding which species, sex or sizes are the most vulnerable to interactions with fisheries gear. Here we apply this approach to two long-term datasets in East Australia, bycatch in the Queensland Shark Control Program (QSCP, 1992–2012) and strandings in the Queensland Marine Wildlife Strandings and Mortality Program (StrandNet, 1996–2012). Short-beaked common dolphins, Delphinus delphis, were markedly more frequent in bycatch than in the strandings dataset, suggesting that they are more prone to being incidentally caught than other cetacean species in the region. The reverse was true for humpback whales, Megaptera novaeangliae, bottlenose dolphins, Tursiops spp.; and species predominantly found in offshore waters. QSCP bycatch was strongly skewed towards females for short-beaked common dolphins, and towards smaller sizes for Australian humpback dolphins, Sousa sahulensis. Overall, both datasets demonstrated similar seasonality and a similar long-term increase from 1996 until 2008. Analysis on a species-by-species basis was then used to explore potential explanations for long-term trends, which ranged from a recovering stock (humpback whales) to a shift in habitat use (short-beaked common dolphins).
Resumo:
Endoraecium is a genus of rust fungi that infects several species of Acacia in Australia, South-East Asia and Hawaii. This study investigated the systematics of Endoraecium from 55 specimens in Australia based on a combined morphological and molecular approach. Phylogenetic analyses were conducted on partitioned datasets of loci from ribosomal and mitochondrial DNA. The recovered molecular phylogeny supported a recently published taxonomy based on morphology and host range that divided Endoraecium digitatum into five species. Spore morphology is synapomorphic and there is evidence Endoraecium co-evolved with its Acacia hosts. The broad host ranges of E. digitatum, E. parvum, E. phyllodiorum and E. violae-faustiae are revised in light of this study, and nine new species of Endoraecium are described from Australia based on host taxonomy, morphology and phylogenetic concordance.
Resumo:
An application that translates raw thermal melt curve data into more easily assimilated knowledge is described. This program, called ‘Meltdown’, performs a number of data remediation steps before classifying melt curves and estimating melting temperatures. The final output is a report that summarizes the results of a differential scanning fluorimetry experiment. Meltdown uses a Bayesian classification scheme, enabling reproducible identification of various trends commonly found in DSF datasets. The goal of Meltdown is not to replace human analysis of the raw data, but to provide a sensible interpretation of the data to make this useful experimental technique accessible to naïve users, as well as providing a starting point for detailed analyses by more experienced users.