8 resultados para Data Extraction

em Doria (National Library of Finland DSpace Services) - National Library of Finland, Finland


Relevância:

60.00% 60.00%

Publicador:

Resumo:

This thesis evaluates methods for obtaining high performance in applications running on the mobile Java platform. Based on the evaluated methods, an optimization was done to a Java extension API running on top the Symbian operating system. The API provides location-based services for mobile Java applications. As a part of this thesis, the JNI implementation in Symbian OS was also benchmarked. A benchmarking tool was implemented in the analysis phase in order to implement extensive performance test set. Based on the benchmark results, it was noted that the landmarks implementation of the API was performing very slowly with large amounts of data. The existing implementation proved to be very inconvenient for optimization because the early implementers did not take performance and design issues into consideration. A completely new architecture was implemented for the API in order to provide scalable landmark initialization and data extraction by using lazy initialization methods. Additionally, runtime memory consumption was also an important part of the optimization. The improvement proved to be very efficient based on the measurements after the optimization. Most of the common API use cases performed extremely well compared to the old implementation. Performance optimization is an important quality attribute of any piece of software especially in embedded mobile devices. Typically, projects get into trouble with performance because there are no clear performance targets and knowledge how to achieve them. Well-known guidelines and performance models help to achieve good overall performance in Java applications and programming interfaces.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of this study was todo a statistical analysis of ecological type from optical satellite data, using Tipping's sparse Bayesian algorithm. This thesis uses "the Relevence Vector Machine" algorithm in ecological classification betweenforestland and wetland. Further this bi-classification technique was used to do classification of many other different species of trees and produces hierarchical classification of entire subclasses given as a target class. Also, we carried out an attempt to use airborne image of same forest area. Combining it with image analysis, using different image processing operation, we tried to extract good features and later used them to perform classification of forestland and wetland.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis author approaches the problem of automated text classification, which is one of basic tasks for building Intelligent Internet Search Agent. The work discusses various approaches to solving sub-problems of automated text classification, such as feature extraction and machine learning on text sources. Author also describes her own multiword approach to feature extraction and pres-ents the results of testing this approach using linear discriminant analysis based classifier, and classifier combining unsupervised learning for etalon extraction with supervised learning using common backpropagation algorithm for multilevel perceptron.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work we study the classification of forest types using mathematics based image analysis on satellite data. We are interested in improving classification of forest segments when a combination of information from two or more different satellites is used. The experimental part is based on real satellite data originating from Canada. This thesis gives summary of the mathematics basics of the image analysis and supervised learning , methods that are used in the classification algorithm. Three data sets and four feature sets were investigated in this thesis. The considered feature sets were 1) histograms (quantiles) 2) variance 3) skewness and 4) kurtosis. Good overall performances were achieved when a combination of ASTERBAND and RADARSAT2 data sets was used.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The major type of non-cellulosic polysaccharides (hemicelluloses) in softwoods, the partly acetylated galactoglucomannans (GGMs), which comprise about 15% of spruce wood, have attracted growing interest because of their potential to become high-value products with applications in many areas. The main objective of this work was to explore the possibilities to extract galactoglucomannans in native, polymeric form in high yield from spruce wood with pressurised hot-water, and to obtain a deeper understanding of the process chemistry involved. Spruce (Picea abies) chips and ground wood particles were extracted using an accelerated solvent extractor (ASE) in the temperature range 160 – 180°C. Detailed chemical analyses were done on both the water extracts and the wood residues. As much as 80 – 90% of the GGMs in spruce wood, i.e. about 13% based on the original wood, could be extracted from ground spruce wood with pure water at 170 – 180°C with an extraction time of 60 min. GGMs comprised about 75% of the extracted carbohydrates and about 60% of the total dissolved solids. Other substances in the water extracts were xylans, arabinogalactans, pectins, lignin and acetic acid. The yields from chips were only about 60% of that from ground wood. Both the GGMs and other non-cellulosic polysaccharides were extensively hydrolysed at severe extraction conditions when pH dropped to the level of 3.5. Addition of sodium bicarbonate increased the yields of polymeric GGMs at low additions, 2.5 – 5 mM, where the end pH remained around 3.9. However, at higher addition levels the yields decreased, mainly because the acetyl groups in GGMs were split off, leading to a low solubility of GGMs. Extraction with buffered water in the pH range 3.8 – 4.4 gave similar yields as with plain water, but gave a higher yield of polymeric GGMs. Moreover, at these pH levels the hydrolysis of acetyl groups in GGMs was significantly inhibited. It was concluded that hot-water extraction of polymeric GGMs in good yields (up to 8% of wood) demands appropriate control of pH, in a narrow range about 4. These results were supported by a study of hydrolysis of GGM at constant pH in the range of 3.8 – 4.2 where a kinetic model for degradation of GGM was developed. The influence of wood particle size on hot-water extraction was studied with particles in the range of 0.1 – 2 mm. The smallest particles (< 0.1 mm) gave 20 – 40% higher total yield than the coarsest particles (1.25 – 2 mm). The difference was greatest at short extraction times. The results indicated that extraction of GGMs and other polysaccharides is limited mainly by the mass transfer in the fibre wall, and for coarse wood particles also in the wood matrix. Spruce sapwood, heartwood and thermomechnical pulp were also compared, but only small differences in yields and composition of extracts were found. Two methods for isolation and purification of polymeric GGMs, i.e. membrane filtration and precipitation in ethanol-water, were compared. Filtration through a series of membranes with different pore sizes separated GGMs of different molar masses, from polymers to oligomers. Polysaccharides with molar mass higher than 4 kDa were precipitated in ethanol-water. GGMs comprised about 80% of the precipitated polysaccharides. Other polysaccharides were mainly arabinoglucuronoxylans and pectins. The ethanol-precipitated GGMs were by 13C NMR spectroscopy verified to be very similar to GGMs extracted from spruce wood in low yield at a much lower temperature, 90°C. The obtained large body of experimental data could be utilised for further kinetic and economic calculations to optimise technical hot-water extractionof softwoods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Most of the applications of airborne laser scanner data to forestry require that the point cloud be normalized, i.e., each point represents height from the ground instead of elevation. To normalize the point cloud, a digital terrain model (DTM), which is derived from the ground returns in the point cloud, is employed. Unfortunately, extracting accurate DTMs from airborne laser scanner data is a challenging task, especially in tropical forests where the canopy is normally very thick (partially closed), leading to a situation in which only a limited number of laser pulses reach the ground. Therefore, robust algorithms for extracting accurate DTMs in low-ground-point-densitysituations are needed in order to realize the full potential of airborne laser scanner data to forestry. The objective of this thesis is to develop algorithms for processing airborne laser scanner data in order to: (1) extract DTMs in demanding forest conditions (complex terrain and low number of ground points) for applications in forestry; (2) estimate canopy base height (CBH) for forest fire behavior modeling; and (3) assess the robustness of LiDAR-based high-resolution biomass estimation models against different field plot designs. Here, the aim is to find out if field plot data gathered by professional foresters can be combined with field plot data gathered by professionally trained community foresters and used in LiDAR-based high-resolution biomass estimation modeling without affecting prediction performance. The question of interest in this case is whether or not the local forest communities can achieve the level technical proficiency required for accurate forest monitoring. The algorithms for extracting DTMs from LiDAR point clouds presented in this thesis address the challenges of extracting DTMs in low-ground-point situations and in complex terrain while the algorithm for CBH estimation addresses the challenge of variations in the distribution of points in the LiDAR point cloud caused by things like variations in tree species and season of data acquisition. These algorithms are adaptive (with respect to point cloud characteristics) and exhibit a high degree of tolerance to variations in the density and distribution of points in the LiDAR point cloud. Results of comparison with existing DTM extraction algorithms showed that DTM extraction algorithms proposed in this thesis performed better with respect to accuracy of estimating tree heights from airborne laser scanner data. On the other hand, the proposed DTM extraction algorithms, being mostly based on trend surface interpolation, can not retain small artifacts in the terrain (e.g., bumps, small hills and depressions). Therefore, the DTMs generated by these algorithms are only suitable for forestry applications where the primary objective is to estimate tree heights from normalized airborne laser scanner data. On the other hand, the algorithm for estimating CBH proposed in this thesis is based on the idea of moving voxel in which gaps (openings in the canopy) which act as fuel breaks are located and their height is estimated. Test results showed a slight improvement in CBH estimation accuracy over existing CBH estimation methods which are based on height percentiles in the airborne laser scanner data. However, being based on the idea of moving voxel, this algorithm has one main advantage over existing CBH estimation methods in the context of forest fire modeling: it has great potential in providing information about vertical fuel continuity. This information can be used to create vertical fuel continuity maps which can provide more realistic information on the risk of crown fires compared to CBH.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Our surrounding landscape is in a constantly dynamic state, but recently the rate of changes and their effects on the environment have considerably increased. In terms of the impact on nature, this development has not been entirely positive, but has rather caused a decline in valuable species, habitats, and general biodiversity. Regardless of recognizing the problem and its high importance, plans and actions of how to stop the detrimental development are largely lacking. This partly originates from a lack of genuine will, but is also due to difficulties in detecting many valuable landscape components and their consequent neglect. To support knowledge extraction, various digital environmental data sources may be of substantial help, but only if all the relevant background factors are known and the data is processed in a suitable way. This dissertation concentrates on detecting ecologically valuable landscape components by using geospatial data sources, and applies this knowledge to support spatial planning and management activities. In other words, the focus is on observing regionally valuable species, habitats, and biotopes with GIS and remote sensing data, using suitable methods for their analysis. Primary emphasis is given to the hemiboreal vegetation zone and the drastic decline in its semi-natural grasslands, which were created by a long trajectory of traditional grazing and management activities. However, the applied perspective is largely methodological, and allows for the application of the obtained results in various contexts. Models based on statistical dependencies and correlations of multiple variables, which are able to extract desired properties from a large mass of initial data, are emphasized in the dissertation. In addition, the papers included combine several data sets from different sources and dates together, with the aim of detecting a wider range of environmental characteristics, as well as pointing out their temporal dynamics. The results of the dissertation emphasise the multidimensionality and dynamics of landscapes, which need to be understood in order to be able to recognise their ecologically valuable components. This not only requires knowledge about the emergence of these components and an understanding of the used data, but also the need to focus the observations on minute details that are able to indicate the existence of fragmented and partly overlapping landscape targets. In addition, this pinpoints the fact that most of the existing classifications are too generalised as such to provide all the required details, but they can be utilized at various steps along a longer processing chain. The dissertation also emphases the importance of landscape history as an important factor, which both creates and preserves ecological values, and which sets an essential standpoint for understanding the present landscape characteristics. The obtained results are significant both in terms of preserving semi-natural grasslands, as well as general methodological development, giving support to science-based framework in order to evaluate ecological values and guide spatial planning.