12 resultados para 080403 Data Structures

em Helda - Digital Repository of University of Helsinki


Relevância:

80.00% 80.00%

Publicador:

Resumo:

DEVELOPING A TEXTILE ONTOLOGY FOR THE SEMANTIC WEB AND CONNECTING IT TO MUSEUM CATALOGING DATA The goal of the Semantic Web is to share concept-based information in a versatile way on the Internet. This is achievable using formal data structures called ontologies. The goal of this re-search is to increase the usability of museum cataloging data in information retrieval. The work is interdisciplinary, involving craft science, terminology science, computer science, and museology. In the first part of the dissertation an ontology of concepts of textiles, garments, and accessories is developed for museum cataloging work. The ontology work was done with the help of thesauri, vocabularies, research reports, and standards. The basis of the ontology development was the Museoalan asiasanasto MASA, a thesaurus for museum cataloging work which has been enriched by other vocabularies. Concepts and terms concerning the research object, as well as the material names of textiles, costumes, and accessories, were focused on. The research method was terminological concept analysis complemented by an ontological view of the Semantic Web. The concept structure was based on the hierarchical generic relation. Attention was also paid to other relations between terms and concepts, and between concepts themselves. Altogether 977 concept classes were created. Issues including how to choose and name concepts for the ontology hierarchy and how deep and broad the hierarchy could be are discussed from the viewpoint of the ontology developer and museum cataloger. The second part of the dissertation analyzes why some of the cataloged terms did not match with the developed textile ontology. This problem is significant because it prevents automatic ontological content integration of the cataloged data on the Semantic Web. The research datasets, i.e. the cataloged museum data on textile collections, came from three museums: Espoo City Museum, Lahti City Museum and The National Museum of Finland. The data included 1803 textile, costume, and accessory objects. Unmatched object and textile material names were analyzed. In the case of the object names six categories (475 cases), and of the material names eight categories (423 cases), were found where automatic annotation was not possible. The most common explanation was that the cataloged field was filled with a long sentence comprised of many terms. Sometimes in the compound term, the object name and material, or the name and the way of usage, were combined. As well, numeric values in the material name cataloging field prevented annotation and so did the absence of a corresponding concept in the ontology. Ready-made drop-down lists of materials used in one cataloging system facilitated the annotation. In the case of naming objects and materials, one should use terms in basic form without attributes. The developed textile ontology has been applied in two cultural portals, MuseumFinland and Culturesampo, where one can search for and browse information based on cataloged data using integrated ontologies in an interoperable way. The textile ontology is also part of the national FinnONTO ontology infrastructure. Keywords: annotation, concept, concept analysis, cataloging, museum collection, ontology, Semantic Web, textile collection, textile material

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Event-based systems are seen as good candidates for supporting distributed applications in dynamic and ubiquitous environments because they support decoupled and asynchronous many-to-many information dissemination. Event systems are widely used, because asynchronous messaging provides a flexible alternative to RPC (Remote Procedure Call). They are typically implemented using an overlay network of routers. A content-based router forwards event messages based on filters that are installed by subscribers and other routers. The filters are organized into a routing table in order to forward incoming events to proper subscribers and neighbouring routers. This thesis addresses the optimization of content-based routing tables organized using the covering relation and presents novel data structures and configurations for improving local and distributed operation. Data structures are needed for organizing filters into a routing table that supports efficient matching and runtime operation. We present novel results on dynamic filter merging and the integration of filter merging with content-based routing tables. In addition, the thesis examines the cost of client mobility using different protocols and routing topologies. We also present a new matching technique called temporal subspace matching. The technique combines two new features. The first feature, temporal operation, supports notifications, or content profiles, that persist in time. The second feature, subspace matching, allows more expressive semantics, because notifications may contain intervals and be defined as subspaces of the content space. We also present an application of temporal subspace matching pertaining to metadata-based continuous collection and object tracking.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The publish/subscribe paradigm has lately received much attention. In publish/subscribe systems, a specialized event-based middleware delivers notifications of events created by producers (publishers) to consumers (subscribers) interested in that particular event. It is considered a good approach for implementing Internet-wide distributed systems as it provides full decoupling of the communicating parties in time, space and synchronization. One flavor of the paradigm is content-based publish/subscribe which allows the subscribers to express their interests very accurately. In order to implement a content-based publish/subscribe middleware in way suitable for Internet scale, its underlying architecture must be organized as a peer-to-peer network of content-based routers that take care of forwarding the event notifications to all interested subscribers. A communication infrastructure that provides such service is called a content-based network. A content-based network is an application-level overlay network. Unfortunately, the expressiveness of the content-based interaction scheme comes with a price - compiling and maintaining the content-based forwarding and routing tables is very expensive when the amount of nodes in the network is large. The routing tables are usually partially-ordered set (poset) -based data structures. In this work, we present an algorithm that aims to improve scalability in content-based networks by reducing the workload of content-based routers by offloading some of their content routing cost to clients. We also provide experimental results of the performance of the algorithm. Additionally, we give an introduction to the publish/subscribe paradigm and content-based networking and discuss alternative ways of improving scalability in content-based networks. ACM Computing Classification System (CCS): C.2.4 [Computer-Communication Networks]: Distributed Systems - Distributed applications

Relevância:

80.00% 80.00%

Publicador:

Resumo:

An efficient and statistically robust solution for the identification of asteroids among numerous sets of astrometry is presented. In particular, numerical methods have been developed for the short-term identification of asteroids at discovery, and for the long-term identification of scarcely observed asteroids over apparitions, a task which has been lacking a robust method until now. The methods are based on the solid foundation of statistical orbital inversion properly taking into account the observational uncertainties, which allows for the detection of practically all correct identifications. Through the use of dimensionality-reduction techniques and efficient data structures, the exact methods have a loglinear, that is, O(nlog(n)), computational complexity, where n is the number of included observation sets. The methods developed are thus suitable for future large-scale surveys which anticipate a substantial increase in the astrometric data rate. Due to the discontinuous nature of asteroid astrometry, separate sets of astrometry must be linked to a common asteroid from the very first discovery detections onwards. The reason for the discontinuity in the observed positions is the rotation of the observer with the Earth as well as the motion of the asteroid and the observer about the Sun. Therefore, the aim of identification is to find a set of orbital elements that reproduce the observed positions with residuals similar to the inevitable observational uncertainty. Unless the astrometric observation sets are linked, the corresponding asteroid is eventually lost as the uncertainty of the predicted positions grows too large to allow successful follow-up. Whereas the presented identification theory and the numerical comparison algorithm are generally applicable, that is, also in fields other than astronomy (e.g., in the identification of space debris), the numerical methods developed for asteroid identification can immediately be applied to all objects on heliocentric orbits with negligible effects due to non-gravitational forces in the time frame of the analysis. The methods developed have been successfully applied to various identification problems. Simulations have shown that the methods developed are able to find virtually all correct linkages despite challenges such as numerous scarce observation sets, astrometric uncertainty, numerous objects confined to a limited region on the celestial sphere, long linking intervals, and substantial parallaxes. Tens of previously unknown main-belt asteroids have been identified with the short-term method in a preliminary study to locate asteroids among numerous unidentified sets of single-night astrometry of moving objects, and scarce astrometry obtained nearly simultaneously with Earth-based and space-based telescopes has been successfully linked despite a substantial parallax. Using the long-term method, thousands of realistic 3-linkages typically spanning several apparitions have so far been found among designated observation sets each spanning less than 48 hours.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We study the following problem: given a geometric graph G and an integer k, determine if G has a planar spanning subgraph (with the original embedding and straight-line edges) such that all nodes have degree at least k. If G is a unit disk graph, the problem is trivial to solve for k = 1. We show that even the slightest deviation from the trivial case (e.g., quasi unit disk graphs or k = 1) leads to NP-hard problems.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The importance of supercontinents in our understanding of the geological evolution of the planet Earth has been recently emphasized. The role of paleomagnetism in reconstructing lithospheric blocks in their ancient paleopositions is vital. Paleomagnetism is the only quantitative tool for providing ancient latitudes and azimuthal orientations of continents. It also yields information of content of the geomagnetic field in the past. In order to obtain a continuous record on the positions of continents, dated intrusive rocks are required in temporal progression. This is not always possible due to pulse-like occurrences of dykes. In this work we demonstrate that studies of meteorite impact-related rocks may fill some gaps in the paleomagnetic record. This dissertation is based on paleomagnetic and rock magnetic data obtained from samples of the Jänisjärvi impact structure (Russian Karelia, most recent 40Ar-39Ar age of 682 Ma), the Salla diabase dyke (North Finland, U-Pb 1122 Ma), the Valaam monzodioritic sill (Russian Karelia, U-Pb 1458 Ma), and the Vredefort impact structure (South Africa, 2023 Ma). The paleomagnetic study of Jänisjärvi samples was made in order to obtain a pole for Baltica, which lacks paleomagnetic data from 750 to ca. 600 Ma. The position of Baltica at ca. 700 Ma is relevant in order to verify whether the supercontinent Rodinia was already fragmented. The paleomagnetic study of the Salla dyke was conducted to examine the position of Baltica at the onset of supercontinent Rodinia's formation. The virtual geomagnetic pole (VGP) from Salla dyke provides hints that the Mesoproterozoic Baltica - Laurentia unity in the Hudsonland (Columbia, Nuna) supercontinent assembly may have lasted until 1.12 Ga. Moreover, the new VGP of Salla dyke provides new constraint on the timing of the rotation of Baltica relative to Laurentia (e.g. Gower et al., 1990). A paleomagnetic study of the Valaam sill was carried out in order to shed light into the question of existence of Baltica-Laurentia unity in the supercontinent Hudsonland. Combined with results from dyke complex of the Lake Ladoga region (Schehrbakova et al., 2008) a new robust paleomagnetic pole for Baltica is obtained. This pole places Baltica on a latitude of 10°. This low latitude location is supported also by Mesoproterozoic 1.5 1.3 Ga red-bed sedimentation (for example the Satakunta sandstone). The Vredefort impactite samples provide a well dated (2.02 Ga) pole for the Kaapvaal Craton. Rock magnetic data reveal unusually high Koenigsberger ratios (Q values) in all studied lithologies of the Vredefort dome. The high Q values are now first time also seen in samples from the Johannesburg Dome (ca. 120 km away) where there is no impact evidence. Thus, a direct causative link of high Q values to the Vredefort impact event can be ruled out.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Scattering of X-rays and neutrons has been applied to the study of nanostructures with interesting biological functions. The systems studied were the protein calmodulin and its complexes, bacterial virus bacteriophage phi6, and the photosynthetic antenna complex from green sulfur bacteria, chlorosome. Information gathered using various structure determination methods has been combined to the low resolution information obtained from solution scattering. Conformational changes in calmodulin-ligand complex were studied by combining the directional information obtained from residual dipole couplings in nuclear magnetic resonance to the size information obtained from small-angle X-ray scattering from solution. The locations of non-structural protein components in a model of bacteriophage phi6, based mainly on electron microscopy, were determined by neutron scattering, deuterium labeling and contrast variation. New data are presented on the structure of the photosynthetic antenna complex of green sulfur bacteria and filamentous anoxygenic phototrophs, also known as the chlorosome. The X-ray scattering and electron cryomicroscopy results from this system are interpreted in the context of a new structural model detailed in the third paper of this dissertation. The model is found to be consistent with the results obtained from various chlorosome containing bacteria. The effect of carotenoid synthesis on the chlorosome structure and self-assembly are studied by carotenoid extraction, biosynthesis inhibition and genetic manipulation of the enzymes involved in carotenoid biosynthesis. Carotenoid composition and content are found to have a marked effect on the structural parameters and morphology of chlorosomes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Physical properties provide valuable information about the nature and behavior of rocks and minerals. The changes in rock physical properties generate petrophysical contrasts between various lithologies, for example, between shocked and unshocked rocks in meteorite impact structures or between various lithologies in the crust. These contrasts may cause distinct geophysical anomalies, which are often diagnostic to their primary cause (impact, tectonism, etc). This information is vital to understand the fundamental Earth processes, such as impact cratering and associated crustal deformations. However, most of the present day knowledge of changes in rock physical properties is limited due to a lack of petrophysical data of subsurface samples, especially for meteorite impact structures, since they are often buried under post-impact lithologies or eroded. In order to explore the uppermost crust, deep drillings are required. This dissertation is based on the deep drill core data from three impact structures: (i) the Bosumtwi impact structure (diameter 10.5 km, 1.07 Ma age; Ghana), (ii) the Chesapeake Bay impact structure (85 km, 35 Ma; Virginia, U.S.A.), and (iii) the Chicxulub impact structure (180 km, 65 Ma; Mexico). These drill cores have yielded all basic lithologies associated with impact craters such as post-impact lithologies, impact rocks including suevites and breccias, as well as fractured and unfractured target rocks. The fourth study case of this dissertation deals with the data of the Paleoproterozoic Outokumpu area (Finland), as a non-impact crustal case, where a deep drilling through an economically important ophiolite complex was carried out. The focus in all four cases was to combine results of basic petrophysical studies of relevant rocks of these crustal structures in order to identify and characterize various lithologies by their physical properties and, in this way, to provide new input data for geophysical modellings. Furthermore, the rock magnetic and paleomagnetic properties of three impact structures, combined with basic petrophysics, were used to acquire insight into the impact generated changes in rocks and their magnetic minerals, in order to better understand the influence of impact. The obtained petrophysical data outline the various lithologies and divide rocks into four domains. Based on target lithology the physical properties of the unshocked target rocks are controlled by mineral composition or fabric, particularly porosity in sedimentary rocks, while sediments result from diverse sedimentation and diagenesis processes. The impact rocks, such as breccias and suevites, strongly reflect the impact formation mechanism and are distinguishable from the other lithologies by their density, porosity and magnetic properties. The numerous shock features resulting from melting, brecciation and fracturing of the target rocks, can be seen in the changes of physical properties. These features include an increase in porosity and subsequent decrease in density in impact derived units, either an increase or a decrease in magnetic properties (depending on a specific case), as well as large heterogeneity in physical properties. In few cases a slight gradual downward decrease in porosity, as a shock-induced fracturing, was observed. Coupled with rock magnetic studies, the impact generated changes in magnetic fraction the shock-induced magnetic grain size reduction, hydrothermal- or melting-related magnetic mineral alteration, shock demagnetization and shock- or temperature-related remagnetization can be seen. The Outokumpu drill core shows varying velocities throughout the drill core depending on the microcracking and sample conditions. This is similar to observations by Kern et al., (2009), who also reported the velocity dependence on anisotropy. The physical properties are also used to explain the distinct crustal reflectors as observed in seismic reflection studies in the Outokumpu area. According to the seismic velocity data, the interfaces between the diopside-tremolite skarn layer and either serpentinite, mica schist or black schist are causing the strong seismic reflectivities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis report attempts to improve the models for predicting forest stand structure for practical use, e.g. forest management planning (FMP) purposes in Finland. Comparisons were made between Weibull and Johnson s SB distribution and alternative regression estimation methods. Data used for preliminary studies was local but the final models were based on representative data. Models were validated mainly in terms of bias and RMSE in the main stand characteristics (e.g. volume) using independent data. The bivariate SBB distribution model was used to mimic realistic variations in tree dimensions by including within-diameter-class height variation. Using the traditional method, diameter distribution with the expected height resulted in reduced height variation, whereas the alternative bivariate method utilized the error-term of the height model. The lack of models for FMP was covered to some extent by the models for peatland and juvenile stands. The validation of these models showed that the more sophisticated regression estimation methods provided slightly improved accuracy. A flexible prediction and application for stand structure consisted of seemingly unrelated regression models for eight stand characteristics, the parameters of three optional distributions and Näslund s height curve. The cross-model covariance structure was used for linear prediction application, in which the expected values of the models were calibrated with the known stand characteristics. This provided a framework to validate the optional distributions and the optional set of stand characteristics. Height distribution is recommended for the earliest state of stands because of its continuous feature. From the mean height of about 4 m, Weibull dbh-frequency distribution is recommended in young stands if the input variables consist of arithmetic stand characteristics. In advanced stands, basal area-dbh distribution models are recommended. Näslund s height curve proved useful. Some efficient transformations of stand characteristics are introduced, e.g. the shape index, which combined the basal area, the stem number and the median diameter. Shape index enabled SB model for peatland stands to detect large variation in stand densities. This model also demonstrated reasonable behaviour for stands in mineral soils.