47 resultados para Near-Duplicate Detection
em Helda - Digital Repository of University of Helsinki
Resumo:
Volatile organic compounds (VOCs) are emitted into the atmosphere from natural and anthropogenic sources, vegetation being the dominant source on a global scale. Some of these reactive compounds are deemed major contributors or inhibitors to aerosol particle formation and growth, thus making VOC measurements essential for current climate change research. This thesis discusses ecosystem scale VOC fluxes measured above a boreal Scots pine dominated forest in southern Finland. The flux measurements were performed using the micrometeorological disjunct eddy covariance (DEC) method combined with proton transfer reaction mass spectrometry (PTR-MS), which is an online technique for measuring VOC concentrations. The measurement, calibration, and calculation procedures developed in this work proved to be well suited to long-term VOC concentration and flux measurements with PTR-MS. A new averaging approach based on running averaged covariance functions improved the determination of the lag time between wind and concentration measurements, which is a common challenge in DEC when measuring fluxes near the detection limit. The ecosystem scale emissions of methanol, acetaldehyde, and acetone were substantial. These three oxygenated VOCs made up about half of the total emissions, with the rest comprised of monoterpenes. Contrary to the traditional assumption that monoterpene emissions from Scots pine originate mainly as evaporation from specialized storage pools, the DEC measurements indicated a significant contribution from de novo biosynthesis to the ecosystem scale monoterpene emissions. This thesis offers practical guidelines for long-term DEC measurements with PTR-MS. In particular, the new averaging approach to the lag time determination seems useful in the automation of DEC flux calculations. Seasonal variation in the monoterpene biosynthesis and the detailed structure of a revised hybrid algorithm, describing both de novo and pool emissions, should be determined in further studies to improve biological realism in the modelling of monoterpene emissions from Scots pine forests. The increasing number of DEC measurements of oxygenated VOCs will probably enable better estimates of the role of these compounds in plant physiology and tropospheric chemistry. Keywords: disjunct eddy covariance, lag time determination, long-term flux measurements, proton transfer reaction mass spectrometry, Scots pine forests, volatile organic compounds
Resumo:
Topic detection and tracking (TDT) is an area of information retrieval research the focus of which revolves around news events. The problems TDT deals with relate to segmenting news text into cohesive stories, detecting something new, previously unreported, tracking the development of a previously reported event, and grouping together news that discuss the same event. The performance of the traditional information retrieval techniques based on full-text similarity has remained inadequate for online production systems. It has been difficult to make the distinction between same and similar events. In this work, we explore ways of representing and comparing news documents in order to detect new events and track their development. First, however, we put forward a conceptual analysis of the notions of topic and event. The purpose is to clarify the terminology and align it with the process of news-making and the tradition of story-telling. Second, we present a framework for document similarity that is based on semantic classes, i.e., groups of words with similar meaning. We adopt people, organizations, and locations as semantic classes in addition to general terms. As each semantic class can be assigned its own similarity measure, document similarity can make use of ontologies, e.g., geographical taxonomies. The documents are compared class-wise, and the outcome is a weighted combination of class-wise similarities. Third, we incorporate temporal information into document similarity. We formalize the natural language temporal expressions occurring in the text, and use them to anchor the rest of the terms onto the time-line. Upon comparing documents for event-based similarity, we look not only at matching terms, but also how near their anchors are on the time-line. Fourth, we experiment with an adaptive variant of the semantic class similarity system. The news reflect changes in the real world, and in order to keep up, the system has to change its behavior based on the contents of the news stream. We put forward two strategies for rebuilding the topic representations and report experiment results. We run experiments with three annotated TDT corpora. The use of semantic classes increased the effectiveness of topic tracking by 10-30\% depending on the experimental setup. The gain in spotting new events remained lower, around 3-4\%. The anchoring the text to a time-line based on the temporal expressions gave a further 10\% increase the effectiveness of topic tracking. The gains in detecting new events, again, remained smaller. The adaptive systems did not improve the tracking results.
Resumo:
The dissertation deals with remote narrowband measurements of the electromagnetic radiation emitted by lightning flashes. A lightning flash consists of a number of sub-processes. The return stroke, which transfers electrical charge from the thundercloud to to the ground, is electromagnetically an impulsive wideband process; that is, it emits radiation at most frequencies in the electromagnetic spectrum, but its duration is only some tens of microseconds. Before and after the return stroke, multiple sub-processes redistribute electrical charges within the thundercloud. These sub-processes can last for tens to hundreds of milliseconds, many orders of magnitude longer than the return stroke. Each sub-process causes radiation with specific time-domain characteristics, having maxima at different frequencies. Thus, if the radiation is measured at a single narrow frequency band, it is difficult to identify the sub-processes, and some sub-processes can be missed altogether. However, narrowband detectors are simple to design and miniaturize. In particular, near the High Frequency band (High Frequency, 3 MHz to 30 MHz), ordinary shortwave radios can, in principle, be used as detectors. This dissertation utilizes a prototype detector which is essentially a handheld AM radio receiver. Measurements were made in Scandinavia, and several independent data sources were used to identify lightning sub-processes, as well as the distance to each individual flash. It is shown that multiple sub-processes radiate strongly near the HF band. The return stroke usually radiates intensely, but it cannot be reliably identified from the time-domain signal alone. This means that a narrowband measurement is best used to characterize the energy of the radiation integrated over the whole flash, without attempting to identify individual processes. The dissertation analyzes the conditions under which this integrated energy can be used to estimate the distance to the flash. It is shown that flash-by-flash variations are large, but the integrated energy is very sensitive to changes in the distance, dropping as approximately the inverse cube root of the distance. Flashes can, in principle, be detected at distances of more than 100 km, but since the ground conductivity can vary, ranging accuracy drops dramatically at distances larger than 20 km. These limitations mean that individual flashes cannot be ranged accurately using a single narrowband detector, and the useful range is limited to 30 kilometers at the most. Nevertheless, simple statistical corrections are developed, which enable an accurate estimate of the distance to the closest edge of an active storm cell, as well as the approach speed. The results of the dissertation could therefore have practical applications in real-time short-range lightning detection and warning systems.
Resumo:
The present challenge in drug discovery is to synthesize new compounds efficiently in minimal time. The trend is towards carefully designed and well-characterized compound libraries because fast and effective synthesis methods easily produce thousands of new compounds. The need for rapid and reliable analysis methods is increased at the same time. Quality assessment, including the identification and purity tests, is highly important since false (negative or positive) results, for instance in tests of biological activity or determination of early-ADME parameters in vitro (the pharmacokinetic study of drug absorption, distribution, metabolism, and excretion), must be avoided. This thesis summarizes the principles of classical planar chromatographic separation combined with ultraviolet (UV) and mass spectrometric (MS) detection, and introduces powerful, rapid, easy, low-cost, and alternative tools and techniques for qualitative and quantitative analysis of small drug or drug-like molecules. High performance thin-layer chromatography (HPTLC) was introduced and evaluated for fast semi-quantitative assessment of the purity of synthesis target compounds. HPTLC methods were compared with the liquid chromatography (LC) methods. Electrospray ionization mass spectrometry (ESI MS) and atmospheric pressure matrix-assisted laser desorption/ionization MS (AP MALDI MS) were used to identify and confirm the product zones on the plate. AP MALDI MS was rapid, and easy to carry out directly on the plate without scraping. The PLC method was used to isolate target compounds from crude synthesized products and purify them for bioactivity and preliminary ADME tests. Ultra-thin-layer chromatography (UTLC) with AP MALDI MS and desorption electrospray ionization mass spectrometry (DESI MS) was introduced and studied for the first time. Because of the thinner adsorbent layer, the monolithic UTLC plate provided 10 100 times better sensitivity in MALDI analysis than did HPTLC plates. The limits of detection (LODs) down to low picomole range were demonstrated for UTLC AP MALDI and UTLC DESI MS. In a comparison of AP and vacuum MALDI MS detection for UTLC plates, desorption from the irregular surface of the plates with the combination of an external AP MALDI ion source and an ion trap instrument provided clearly less variation in mass accuracy than the vacuum MALDI time-of-flight (TOF) instrument. The performance of the two-dimensional (2D) UTLC separation with AP MALDI MS method was studied for the first time. The influence of the urine matrix on the separation and the repeatability was evaluated with benzodiazepines as model substances in human urine. The applicability of 2D UTLC AP MALDI MS was demonstrated in the detection of metabolites in an authentic urine sample.
Resumo:
In order to improve and continuously develop the quality of pharmaceutical products, the process analytical technology (PAT) framework has been adopted by the US Food and Drug Administration. One of the aims of PAT is to identify critical process parameters and their effect on the quality of the final product. Real time analysis of the process data enables better control of the processes to obtain a high quality product. The main purpose of this work was to monitor crucial pharmaceutical unit operations (from blending to coating) and to examine the effect of processing on solid-state transformations and physical properties. The tools used were near-infrared (NIR) and Raman spectroscopy combined with multivariate data analysis, as well as X-ray powder diffraction (XRPD) and terahertz pulsed imaging (TPI). To detect process-induced transformations in active pharmaceutical ingredients (APIs), samples were taken after blending, granulation, extrusion, spheronisation, and drying. These samples were monitored by XRPD, Raman, and NIR spectroscopy showing hydrate formation in the case of theophylline and nitrofurantoin. For erythromycin dihydrate formation of the isomorphic dehydrate was critical. Thus, the main focus was on the drying process. NIR spectroscopy was applied in-line during a fluid-bed drying process. Multivariate data analysis (principal component analysis) enabled detection of the dehydrate formation at temperatures above 45°C. Furthermore, a small-scale rotating plate device was tested to provide an insight into film coating. The process was monitored using NIR spectroscopy. A calibration model, using partial least squares regression, was set up and applied to data obtained by in-line NIR measurements of a coating drum process. The predicted coating thickness agreed with the measured coating thickness. For investigating the quality of film coatings TPI was used to create a 3-D image of a coated tablet. With this technique it was possible to determine coating layer thickness, distribution, reproducibility, and uniformity. In addition, it was possible to localise defects of either the coating or the tablet. It can be concluded from this work that the applied techniques increased the understanding of physico-chemical properties of drugs and drug products during and after processing. They additionally provided useful information to improve and verify the quality of pharmaceutical dosage forms
Resumo:
The auditory system can detect occasional changes (deviants) in acoustic regularities without the need for subjects to focus their attention on the sound material. Deviant detection is reflected in the elicitation of the mismatch negativity component (MMN) of the event-related potentials. In the studies presented in this thesis, the MMN is used to investigate the auditory abilities for detecting similarities and regularities in sound streams. To investigate the limits of these processes, professional musicians have been tested in some of the studies. The results show that auditory grouping is already more advanced in musicians than in nonmusicians and that the auditory system of musicians can, unlike that of nonmusicians, detect a numerical regularity of always four tones in a series. These results suggest that sensory auditory processing in musicians is not only a fine tuning of universal abilities, but is also qualitatively more advanced than in nonmusicians. In addition, the relationship between the auditory change-detection function and perception is examined. It is shown that, contrary to the generally accepted view, MMN elicitation does not necessarily correlate with perception. The outcome of the auditory change-detection function can be implicit and the implicit knowledge of the sound structure can, after training, be utilized for behaviorally correct intuitive sound detection. These results illustrate the automatic character of the sensory change detection function.
Resumo:
The earliest stages of human cortical visual processing can be conceived as extraction of local stimulus features. However, more complex visual functions, such as object recognition, require integration of multiple features. Recently, neural processes underlying feature integration in the visual system have been under intensive study. A specialized mid-level stage preceding the object recognition stage has been proposed to account for the processing of contours, surfaces and shapes as well as configuration. This thesis consists of four experimental, psychophysical studies on human visual feature integration. In two studies, classification image a recently developed psychophysical reverse correlation method was used. In this method visual noise is added to near-threshold stimuli. By investigating the relationship between random features in the noise and observer s perceptual decision in each trial, it is possible to estimate what features of the stimuli are critical for the task. The method allows visualizing the critical features that are used in a psychophysical task directly as a spatial correlation map, yielding an effective "behavioral receptive field". Visual context is known to modulate the perception of stimulus features. Some of these interactions are quite complex, and it is not known whether they reflect early or late stages of perceptual processing. The first study investigated the mechanisms of collinear facilitation, where nearby collinear Gabor flankers increase the detectability of a central Gabor. The behavioral receptive field of the mechanism mediating the detection of the central Gabor stimulus was measured by the classification image method. The results show that collinear flankers increase the extent of the behavioral receptive field for the central Gabor, in the direction of the flankers. The increased sensitivity at the ends of the receptive field suggests a low-level explanation for the facilitation. The second study investigated how visual features are integrated into percepts of surface brightness. A novel variant of the classification image method with brightness matching task was used. Many theories assume that perceived brightness is based on the analysis of luminance border features. Here, for the first time this assumption was directly tested. The classification images show that the perceived brightness of both an illusory Craik-O Brien-Cornsweet stimulus and a real uniform step stimulus depends solely on the border. Moreover, the spatial tuning of the features remains almost constant when the stimulus size is changed, suggesting that brightness perception is based on the output of a single spatial frequency channel. The third and fourth studies investigated global form integration in random-dot Glass patterns. In these patterns, a global form can be immediately perceived, if even a small proportion of random dots are paired to dipoles according to a geometrical rule. In the third study the discrimination of orientation structure in highly coherent concentric and Cartesian (straight) Glass patterns was measured. The results showed that the global form was more efficiently discriminated in concentric patterns. The fourth study investigated how form detectability depends on the global regularity of the Glass pattern. The local structure was either Cartesian or curved. It was shown that randomizing the local orientation deteriorated the performance only with the curved pattern. The results give support for the idea that curved and Cartesian patterns are processed in at least partially separate neural systems.
Human cortical functions in auditory change detection evaluated with multiple brain research methods
Resumo:
Objectives of this study were to determine secular trends of diabetes prevalence in China and develop simple risk assessment algorithms for screening individuals with high-risk for diabetes or with undiagnosed diabetes in Chinese and Indian adults. Two consecutive population based surveys in Chinese and a prospective study in Mauritian Indians were involved in this study. The Chinese surveys were conducted in randomly selected populations aged 20-74 years in 2001-2002 (n=14 592) and 35-74 years in 2006 (n=4416). A two-step screening strategy using fasting capillary plasma glucose (FCG) as first-line screening test followed by standard 2-hour 75g oral glucose tolerance tests (OGTTs) was applied to 12 436 individuals in 2001, while OGTTs were administrated to all participants together with FCG in 2006 and to 2156 subjects in 2002. In Mauritius, two consecutive population based surveys were conducted in Mauritian Indians aged 20-65 years in 1987 and 1992; 3094 Indians (1141 men), who were not diagnosed as diabetes at baseline, were reexamined with OGTTs in 1992 and/or 1998. Diabetes and pre-diabetes was defined following 2006 World Health Organization/ International Diabetes Federation Criteria. Age-standardized, as well as age- and sex-specific, prevalence of diabetes and pre-diabetes in adult Chinese was significantly increased from 12.2% and 15.4% in 2001 to 16.0% and 21.2% in 2006, respectively. A simple Chinese diabetes risk score was developed based on the data of Chinese survey 2001-2002 and validated in the population of survey 2006. The risk scores based on β coefficients derived from the final Logistic regression model ranged from 3 – 32. When the score was applied to the population of survey 2006, the area under operating characteristic curve (AUC) of the score for screening undiagnosed diabetes was 0.67 (95% CI, 0.65-0.70), which was lower than the AUC of FCG (0.76 [0.74-0.79]), but similar to that of HbA1c (0.68 [0.65-0.71]). At a cut-off point of 14, the sensitivity and specificity of the risk score in screening undiagnosed diabetes was 0.84 (0.81-0.88) and 0.40 (0.38-0.41). In Mauritian Indian, body mass index (BMI), waist girth, family history of diabetes (FH), and glucose was confirmed to be independent risk predictors for developing diabetes. Predicted probabilities for developing diabetes derived from a simple Cox regression model fitted with sex, FH, BMI and waist girth ranged from 0.05 to 0.64 in men and 0.03 to 0.49 in women. To predict the onset of diabetes, the AUC of the predicted probabilities was 0.62 (95% CI, 0.56-0.68) in men and 0.64(0.59-0.69) in women. At a cut-off point of 0.12, the sensitivity and specificity was 0.72(0.71-0.74) and 0.47(0.45-0.49) in men; and 0.77(0.75-0.78) and 0.50(0.48-0.52) in women, respectively. In conclusion, there was a rapid increase in prevalence of diabetes in Chinese adults from 2001 to 2006. The simple risk assessment algorithms based on age, obesity and family history of diabetes showed a moderate discrimination of diabetes from non-diabetes, which may be used as first line screening tool for diabetes and pre-diabetes, and for health promotion purpose in Chinese and Indians.
Resumo:
A population-based early detection program for breast cancer has been in progress in Finland since 1987. According to regulations during the study period 1987-2001, free of charge mammography screening was offered every second year to women aged 50-59 years. Recently, the screening service was decided to be extended to age group 50-69. However, the scope of the program is still frequently discussed in public and information about potential impacts of mass-screening practice changes on future breast cancer burden is required. The aim of this doctoral thesis is to present methodologies for taking into account the mass-screening invitation information in breast cancer burden predictions, and to present alternative breast cancer incidence and mortality predictions up to 2012 based on scenarios of the future screening policy. The focus of this work is not on assessing the absolute efficacy but the effectiveness of mass-screening, and, by utilizing the data on invitations, on showing the estimated impacts of changes in an existing screening program on the short-term predictions. The breast cancer mortality predictions are calculated using a model that combines incidence, cause-specific and other cause survival on individual level. The screening invitation data are incorporated into modeling of breast cancer incidence and survival by dividing the program into separate components (first and subsequent rounds and years within them, breaks, and post screening period) and defining a variable that gives the component of the screening program. The incidence is modeled using a Poisson regression approach and the breast cancer survival by applying a parametric mixture cure model, where the patient population is allowed to be a combination of cured and uncured patients. The patients risk to die from other causes than breast cancer is allowed to differ from that of a corresponding general population group and to depend on age and follow-up time. As a result, the effects of separate components of the screening program on incidence, proportion of cured and the survival of the uncured are quantified. According to the predictions, the impacts of policy changes, like extending the program from age group 50-59 to 50-69, are clearly visible on incidence while the effects on mortality in age group 40-74 are minor. Extending the screening service would increase the incidence of localized breast cancers but decrease the rates of non-localized breast cancer. There were no major differences between mortality predictions yielded by alternative future scenarios of the screening policy: Any policy change would have at the most a 3.0% reduction on overall breast cancer mortality compared to continuing the current practice in the near future.
Resumo:
Burnt area mapping in humid tropical insular Southeast Asia using medium resolution (250-500m) satellite imagery is characterized by persisting cloud cover, wide range of land cover types, vast amount of wetland areas and highly varying fire regimes. The objective of this study was to deepen understanding of three major aspects affecting the implementation and limits of medium resolution burnt area mapping in insular Southeast Asia: 1) fire-induced spectral changes, 2) most suitable multitemporal compositing methods and 3) burn scars patterns and size distribution. The results revealed a high variation in fire-induced spectral changes depending on the pre-fire greenness of burnt area. It was concluded that this variation needs to be taken into account in change detection based burnt area mapping algorithms in order to maximize the potential of medium resolution satellite data. Minimum near infrared (MODIS band 2, 0.86μm) compositing method was found to be the most suitable for burnt area mapping purposes using Moderate Resolution Imaging Spectroradiometer (MODIS) data. In general, medium resolution burnt area mapping was found to be usable in the wetlands of insular Southeast Asia, whereas in other areas the usability was seriously jeopardized by the small size of burn scars. The suitability of medium resolution data for burnt area mapping in wetlands is important since recently Southeast Asian wetlands have become a major point of interest in many fields of science due to yearly occurring wild fires that not only degrade these unique ecosystems but also create regional haze problem and release globally significant amounts of carbon into the atmosphere due to burning peat. Finally, super-resolution MODIS images were tested but the test failed to improve the detection of small scars. Therefore, super-resolution technique was not considered to be applicable to regional level burnt area mapping in insular Southeast Asia.
Resumo:
The aim of this thesis is to develop a fully automatic lameness detection system that operates in a milking robot. The instrumentation, measurement software, algorithms for data analysis and a neural network model for lameness detection were developed. Automatic milking has become a common practice in dairy husbandry, and in the year 2006 about 4000 farms worldwide used over 6000 milking robots. There is a worldwide movement with the objective of fully automating every process from feeding to milking. Increase in automation is a consequence of increasing farm sizes, the demand for more efficient production and the growth of labour costs. As the level of automation increases, the time that the cattle keeper uses for monitoring animals often decreases. This has created a need for systems for automatically monitoring the health of farm animals. The popularity of milking robots also offers a new and unique possibility to monitor animals in a single confined space up to four times daily. Lameness is a crucial welfare issue in the modern dairy industry. Limb disorders cause serious welfare, health and economic problems especially in loose housing of cattle. Lameness causes losses in milk production and leads to early culling of animals. These costs could be reduced with early identification and treatment. At present, only a few methods for automatically detecting lameness have been developed, and the most common methods used for lameness detection and assessment are various visual locomotion scoring systems. The problem with locomotion scoring is that it needs experience to be conducted properly, it is labour intensive as an on-farm method and the results are subjective. A four balance system for measuring the leg load distribution of dairy cows during milking in order to detect lameness was developed and set up in the University of Helsinki Research farm Suitia. The leg weights of 73 cows were successfully recorded during almost 10,000 robotic milkings over a period of 5 months. The cows were locomotion scored weekly, and the lame cows were inspected clinically for hoof lesions. Unsuccessful measurements, caused by cows standing outside the balances, were removed from the data with a special algorithm, and the mean leg loads and the number of kicks during milking was calculated. In order to develop an expert system to automatically detect lameness cases, a model was needed. A probabilistic neural network (PNN) classifier model was chosen for the task. The data was divided in two parts and 5,074 measurements from 37 cows were used to train the model. The operation of the model was evaluated for its ability to detect lameness in the validating dataset, which had 4,868 measurements from 36 cows. The model was able to classify 96% of the measurements correctly as sound or lame cows, and 100% of the lameness cases in the validation data were identified. The number of measurements causing false alarms was 1.1%. The developed model has the potential to be used for on-farm decision support and can be used in a real-time lameness monitoring system.
Detection of major mite pests of Apis mellifera and development of non-chemical control of varroasis
Resumo:
Knowing the chromosomal areas or actual genes affecting the traits under selection would add more information to be used in the selection decisions which would potentially lead to higher genetic response. The first objective of this study was to map quantitative trait loci (QTL) affecting economically important traits in the Finnish Ayrshire population. The second objective was to investigate the effects of using QTL information in marker-assisted selection (MAS) on the genetic response and the linkage disequilibrium between the different parts of the genome. Whole genome scans were carried out on a grand-daughter design with 12 half-sib families and a total of 493 sons. Twelve different traits were studied: milk yield, protein yield, protein content, fat yield, fat content, somatic cell score (SCS), mastitis treatments, other veterinary treatments, days open, fertility treatments, non-return rate, and calf mortality. The average spacing of the typed markers was 20 cM with 2 to 14 markers per chromosome. Associations between markers and traits were analyzed with multiple marker regression. Significance was determined by permutation and genome-wise P-values obtained by Bonferroni correction. The benefits from MAS were investigated by simulation: a conventional progeny testing scheme was compared to a scheme where QTL information was used within families to select among full-sibs in the male path. Two QTL on different chromosomes were modelled. The effects of different starting frequencies of the favourable alleles and different size of the QTL effects were evaluated. A large number of QTL, 48 in total, were detected at 5% or higher chromosome-wise significance. QTL for milk production were found on 8 chromosomes, for SCS on 6, for mastitis treatments on 1, for other veterinary treatments on 5, for days open on 7, for fertility treatments on 7, for calf mortality on 6, and for non-return rate on 2 chromosomes. In the simulation study the total genetic response was faster with MAS than with conventional selection and the advantage of MAS persisted over the studied generations. The rate of response and the difference between the selection schemes reflected clearly the changes in allele frequencies of the favourable QTL. The disequilibrium between the polygenes and QTL was always negative and it was larger with larger QTL size. The disequilibrium between the two QTL was larger with QTL of large effect and it was somewhat larger with MAS for scenarios with starting frequencies below 0.5 for QTL of moderate size and below 0.3 for large QTL. In conclusion, several QTL affecting economically important traits of dairy cattle were detected. Further studies are needed to verify these QTL, check their presence in the present breeding population, look for pleiotropy and fine map the most interesting QTL regions. The results of the simulation studies show that using MAS together with embryo transfer to pre-select young bulls within families is a useful approach to increase the genetic merit of the AI-bulls compared to conventional selection.