98 resultados para Search problems

em Helda - Digital Repository of University of Helsinki


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Segmentation is a data mining technique yielding simplified representations of sequences of ordered points. A sequence is divided into some number of homogeneous blocks, and all points within a segment are described by a single value. The focus in this thesis is on piecewise-constant segments, where the most likely description for each segment and the most likely segmentation into some number of blocks can be computed efficiently. Representing sequences as segmentations is useful in, e.g., storage and indexing tasks in sequence databases, and segmentation can be used as a tool in learning about the structure of a given sequence. The discussion in this thesis begins with basic questions related to segmentation analysis, such as choosing the number of segments, and evaluating the obtained segmentations. Standard model selection techniques are shown to perform well for the sequence segmentation task. Segmentation evaluation is proposed with respect to a known segmentation structure. Applying segmentation on certain features of a sequence is shown to yield segmentations that are significantly close to the known underlying structure. Two extensions to the basic segmentation framework are introduced: unimodal segmentation and basis segmentation. The former is concerned with segmentations where the segment descriptions first increase and then decrease, and the latter with the interplay between different dimensions and segments in the sequence. These problems are formally defined and algorithms for solving them are provided and analyzed. Practical applications for segmentation techniques include time series and data stream analysis, text analysis, and biological sequence analysis. In this thesis segmentation applications are demonstrated in analyzing genomic sequences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this study I offer a diachronic solution for a number of difficult inflectional endings in Old Church Slavic nominal declensions. In this context I address the perhaps most disputed and the most important question of the Slavic nominal inflectional morphology: whether there was in Proto-Slavic an Auslautgesetz (ALG), a law of final syllables, that narrowed the Proto-Indo-European vowel */o/ to */u/ in closed word-final syllables. In addition, the work contains an exhaustive morphological classification of the nouns and adjectives that occur in canonical Old Church Slavic. I argue that Proto-Indo-European */o/ became Proto-Slavic */u/ before word-final */s/ and */N/. This conclusion is based on the impossibility of finding credible analogical (as opposed to phonological) explanations for the forms supporting the ALG hypothesis, and on the survival of the neuter gender in Slavic. It is not likely that the */o/-stem nominative singular ending */-u/ was borrowed from the accusative singular, because the latter would have been the only paradigmatic form with the stem vowel */-u-/. It is equally unlikely that the ending */-u/ was borrowed from the */u/-stems, because the latter constituted a moribund class. The usually stated motivation for such an analogical borrowing, i.e. a need to prevent the merger of */o/-stem masculines with neuters of the same class, is not tenable. Extra-Slavic, as well as intra-Slavic evidence suggests that phonologically-triggered mergers between two semantically opaque genders do not tend to be prevented, but rather that such mergers lead to the loss of the gender opposition in question. On the other hand, if */-os/ had not become */-us/, most nouns and, most importantly, all adjectives and pronouns would have lost the formal distinction between masculines and neuters. This would have necessarily resulted in the loss of the neuter gender. A new explanation is given for the most apparent piece of evidence against the ALG hypothesis, the nominative-accusative singular of the */es/-stem neuters, e.g. nebo 'sky'. I argue that it arose in late Proto-Slavic dialects, replacing regular nebe, under the influence of the */o/- and */yo/-stems where a correlation had emerged between a hard root-final consonant and the termination -o, on the one hand, and a soft root-final consonant and the termination -e, on the other.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In visual search one tries to find the currently relevant item among other, irrelevant items. In the present study, visual search performance for complex objects (characters, faces, computer icons and words) was investigated, and the contribution of different stimulus properties, such as luminance contrast between characters and background, set size, stimulus size, colour contrast, spatial frequency, and stimulus layout were investigated. Subjects were required to search for a target object among distracter objects in two-dimensional stimulus arrays. The outcome measure was threshold search time, that is, the presentation duration of the stimulus array required by the subject to find the target with a certain probability. It reflects the time used for visual processing separated from the time used for decision making and manual reactions. The duration of stimulus presentation was controlled by an adaptive staircase method. The number and duration of eye fixations, saccade amplitude, and perceptual span, i.e., the number of items that can be processed during a single fixation, were measured. It was found that search performance was correlated with the number of fixations needed to find the target. Search time and the number of fixations increased with increasing stimulus set size. On the other hand, several complex objects could be processed during a single fixation, i.e., within the perceptual span. Search time and the number of fixations depended on object type as well as luminance contrast. The size of the perceptual span was smaller for more complex objects, and decreased with decreasing luminance contrast within object type, especially for very low contrasts. In addition, the size and shape of perceptual span explained the changes in search performance for different stimulus layouts in word search. Perceptual span was scale invariant for a 16-fold range of stimulus sizes, i.e., the number of items processed during a single fixation was independent of retinal stimulus size or viewing distance. It is suggested that saccadic visual search consists of both serial (eye movements) and parallel (processing within perceptual span) components, and that the size of the perceptual span may explain the effectiveness of saccadic search in different stimulus conditions. Further, low-level visual factors, such as the anatomical structure of the retina, peripheral stimulus visibility and resolution requirements for the identification of different object types are proposed to constrain the size of the perceptual span, and thus, limit visual search performance. Similar methods were used in a clinical study to characterise the visual search performance and eye movements of neurological patients with chronic solvent-induced encephalopathy (CSE). In addition, the data about the effects of different stimulus properties on visual search in normal subjects were presented as simple practical guidelines, so that the limits of human visual perception could be taken into account in the design of user interfaces.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The aim of this study was to build a model and analyze how users move in a virtual environment and to explore the experiential dimensions connected with different ways of moving. Due to the lack of previous research on this subject, this was an explorative study. This study also aimed to identify different ways how users move in virtual environments and the background variables connected to them. It was hypothesized that fluent movement in virtual environments is connected to high presence, skill and challenge assessments. Test participants (n = 68) were mostly highly educated young adults. A virtual environment was built using a CAVE -type virtual reality interface. The task was to search for objects that do not belong into a normal house. The participants movement in the virtual house was recorded on a computer. Movement was modelled using a cluster analysis of information entropy based movement measurements, acceleration, amount of stops and time spent being stationary. The experiential dimensions were measured using the EVEQ -questionnaire. We were able to identify four different ways of moving in virtual environments. In respect of background variables, the four groups differed only in the amount of weekly computer usage. However, fluent movement in virtual environments was connected to a high sense of presence. Furthermore, participants who moved fluently in the environment assessed their skills as being high and regarded the use of virtual environment as challenging. The results indicate that different ways of moving affects how people experience virtual environments. Consequently the participants assessment of their skills and level of challenge have an impact on the affective evaluation of the situation at hand. Entropy measures have not been previously applied when studying movement, and in addition the role of movement on the experiential dimensions of virtual environments is an unexplored subject. The movement analysis method introduced here is applicable to other research problems. Finally, this study expands on our knowledge of the special characteristics connected with the experiential dimensions of virtual environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Due to the improved prognosis of many forms of cancer, an increasing number of cancer survivors are willing to return to work after their treatment. It is generally believed, however, that people with cancer are either unemployed, stay at home, or retire more often than people without cancer. This study investigated the problems that cancer survivors experience on the labour market, as well as the disease-related, sociodemographic and psychosocial factors at work that are associated with the employment and work ability of cancer survivors. The impact of cancer on employment was studied combining the data of Finnish Cancer Registry and census data of the years 1985, 1990, 1995 or 1997 of Statistics Finland. There were two data sets containing 46 312 and 12 542 people with cancer. The results showed that cancer survivors were slightly less often employed than their referents. Two to three years after the diagnosis the employment rate of the cancer survivors was 9% lower than that of their referents (64% vs. 73%), whereas the employment rate was the same before the diagnosis (78%). The employment rate varied greatly according to the cancer type and education. The probability of being employed was greater in the lower than in the higher educational groups. People with cancer were less often employed than people without cancer mainly because of their higher retirement rate (34% vs. 27%). As well as employment, retirement varied by cancer type. The risk of retirement was twofold for people having cancer of the nervous system or people with leukaemia compared to their referents, whereas people with skin cancer, for example, did not have an increased risk of retirement. The aim of the questionnaire study was to investigate whether the work ability of cancer survivors differs from that of people without cancer and whether cancer had impaired their work ability. There were 591 cancer survivors and 757 referents in the data. Even though current work ability of cancer survivors did not differ between the survivors and their referents, 26% of cancer survivors reported that their physical work ability, and 19% that their mental work ability had deteriorated due to cancer. The survivors who had other diseases or had had chemotherapy, most often reported impaired work ability, whereas survivors with a strong commitment to their work organization, or a good social climate at work, reported impairment less frequently. The aim of the other questionnaire study containing 640 people with the history of cancer was to examine extent of social support that cancer survivors needed, and had received from their work community. The cancer survivors had received most support from their co-workers, and they hoped for more support especially from the occupational health care personnel (39% of women and 29% of men). More support was especially needed by men who had lymphoma, had received chemotherapy or had a low education level. The results of this study show that the majority of the survivors are able to return to work. There is, however, a group of cancer survivors who leave work life early, have impaired work ability due to their illness, and suffer from lack of support from their work place and the occupational health services. Treatment-related, as well as sociodemographic factors play an important role in survivors' work-related problems, and presumably their possibilities to continue working.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Human growth and attained height are determined by a combination of genetic and environmental effects and in modern Western societies > 80% of the observed variation in height is determined by genetic factors. Height is a fundamental human trait that is associated with many socioeconomic and psychosocial factors and health measures, however little is known of the identity of the specific genes that influence height variation in the general population. This thesis work aimed to identify the genetic variants that influence height in the general population by genome-wide linkage analysis utilizing large family samples. The study focused on analysis of three separate sets of families consisting of: 1) 1,417 individuals from 277 Finnish families (FinnHeight), 2) 8,450 individuals from 3,817 families from Australia and Europe (EUHeight) and 3) 9,306 individuals from 3,302 families from the United States (USHeight). The most significant finding in this study was found in the Finnish family sample where we a locus in the chromosomal region 1p21 was linked to adult height. Several regions showed evidence for linkage in the Australian, European and US families with 8q21 and 15q25 being the most significant. The region on 1p21 was followed up with further studies and we were able to show that the collagen 11-alpha-1 gene (COL11A1) residing at this location was associated with adult height. This association was also confirmed in an independent Finnish population cohort (Health 2000) consisting of 6,542 individuals. From this population sample, we estimated that homozygous males and females for this gene variant were 1.1 and 0.6 cm taller than the respective controls. In this thesis work we identified a gene variant in the COL11A1 gene that influences human height, although this variant alone explains only 0.1% of height variation in the Finnish population. We also demonstrated in this study that special stratification strategies such as performing sex-limited analyses, focusing on dizygous twin pairs, analyzing ethnic groups within a population separately and utilizing homogenous populations such as the Finns can improve the statistical power of finding QTL significantly. Also, we concluded from the results of this study that even though genetic effects explain a great proportion of height variance, it is likely that there are tens or even hundreds of genes with small individual effects underlying the genetic architecture of height.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Schizophrenia is a severe mental disorder affecting 0.4-1% of the population worldwide. It is characterized by impairments in the perception of reality and by significant social or occupational dysfunction. The disorder is one of the major contributors to the global burden of diseases. Studies of twins, families, and adopted children point to strong genetic components for schizophrenia, but environmental factors also play a role in the pathogenesis of disease. Molecular genetic studies have identified several potential positional candidate genes. The strongest evidence for putative schizophrenia susceptibility loci relates to the genes encoding dysbindin (DTNBP1) and neuregulin (NRG1), but studies lack impressive consistency in the precise genetic regions and alleles implicated. We have studied the role of three potential candidate genes by genotyping 28 single nucleotide polymorphisms in the DNTBP1, NRG1, and AKT1 genes in a large schizophrenia family sample consisting of 441 families with 865 affected individuals from Finland. Our results do not support a major role for these genes in the pathogenesis of schizophrenia in Finland. We have previously identified a region on chromosome 5q21-34 as a susceptibility locus for schizophrenia in a Finnish family sample. Recently, two studies reported association between the γ-aminobutyric acid type A receptor cluster of genes in this region and one study showed suggestive evidence for association with another regional gene encoding clathrin interactor 1 (CLINT1, also called Epsin 4 and ENTH). To further address the significance of these genes under the linkage peak in the Finnish families, we genotyped SNPs of these genes, and observed statistically significant association of variants between GABRG2 and schizophrenia. Furthermore, these variants also seem to affect the functioning of the working memory. Fetal events and obstetric complications are associated with schizophrenia. Rh incompatibility has been implicated as a risk factor for schizophrenia in several epidemiological studies. We conducted a family-based candidate-gene study that assessed the role of maternal-fetal genotype incompatibility at the RhD locus in schizophrenia. There was significant evidence for an RhD maternal-fetal genotype incompatibility, and the risk ratio was estimated at 2.3. This is the first candidate-gene study to explicitly test for and provide evidence of a maternal-fetal genotype incompatibility mechanism in schizophrenia. In conclusion, in this thesis we found evidence that one GABA receptor subunit, GABRG2, is significantly associated with schizophrenia. Furthermore, it also seems to affect to the functioning of the working memory. In addition, an RhD maternal-fetal genotype incompatibility increases the risk of schizophrenia by two-fold.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Buffer zones are vegetated strip-edges of agricultural fields along watercourses. As linear habitats in agricultural ecosystems, buffer strips dominate and play a leading ecological role in many areas. This thesis focuses on the plant species diversity of the buffer zones in a Finnish agricultural landscape. The main objective of the present study is to identify the determinants of floral species diversity in arable buffer zones from local to regional levels. This study was conducted in a watershed area of a farmland landscape of southern Finland. The study area, Lepsämänjoki, is situated in the Nurmijärvi commune 30 km to the north of Helsinki, Finland. The biotope mosaics were mapped in GIS. A total of 59 buffer zones were surveyed, of which 29 buffer strips surveyed were also sampled by plot. Firstly, two diversity components (species richness and evenness) were investigated to determine whether the relationship between the two is equal and predictable. I found no correlation between species richness and evenness. The relationship between richness and evenness is unpredictable in a small-scale human-shaped ecosystem. Ordination and correlation analyses show that richness and evenness may result from different ecological processes, and thus should be considered separately. Species richness correlated negatively with phosphorus content, and species evenness correlated negatively with the ratio of organic carbon to total nitrogen in soil. The lack of a consistent pattern in the relationship between these two components may be due to site-specific variation in resource utilization by plant species. Within-habitat configuration (width, length, and area) were investigated to determine which is more effective for predicting species richness. More species per unit area increment could be obtained from widening the buffer strip than from lengthening it. The width of the strips is an effective determinant of plant species richness. The increase in species diversity with an increase in the width of buffer strips may be due to cross-sectional habitat gradients within the linear patches. This result can serve as a reference for policy makers, and has application value in agricultural management. In the framework of metacommunity theory, I found that both mass effect(connectivity) and species sorting (resource heterogeneity) were likely to explain species composition and diversity on a local and regional scale. The local and regional processes were interactively dominated by the degree to which dispersal perturbs local communities. In the lowly and intermediately connected regions, species sorting was of primary importance to explain species diversity, while the mass effect surpassed species sorting in the highly connected region. Increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities, and consequently, to lower regional diversity, while local species richness was unrelated to the habitat connectivity. Of all species found, Anthriscus sylvestris, Phalaris arundinacea, and Phleum pretense significantly responded to connectivity, and showed high abundance in the highly connected region. We suggest that these species may play a role in switching the force from local resources to regional connectivity shaping the community structure. On the landscape context level, the different responses of local species richness and evenness to landscape context were investigated. Seven landscape structural parameters served to indicate landscape context on five scales. On all scales but the smallest scales, the Shannon-Wiener diversity of land covers (H') correlated positively with the local richness. The factor (H') showed the highest correlation coefficients in species richness on the second largest scale. The edge density of arable field was the only predictor that correlated with species evenness on all scales, which showed the highest predictive power on the second smallest scale. The different predictive power of the factors on different scales showed a scaledependent relationship between the landscape context and local plant species diversity, and indicated that different ecological processes determine species richness and evenness. The local richness of species depends on a regional process on large scales, which may relate to the regional species pool, while species evenness depends on a fine- or coarse-grained farming system, which may relate to the patch quality of the habitats of field edges near the buffer strips. My results suggested some guidelines of species diversity conservation in the agricultural ecosystem. To maintain a high level of species diversity in the strips, a high level of phosphorus in strip soil should be avoided. Widening the strips is the most effective mean to improve species richness. Habitat connectivity is not always favorable to species diversity because increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities (beta diversity) and, consequently, to lower regional diversity. Overall, a synthesis of local and regional factors emerged as the model that best explain variations in plant species diversity. The studies also suggest that the effects of determinants on species diversity have a complex relationship with scale.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advancements in the analysis techniques have led to a rapid accumulation of biological data in databases. Such data often are in the form of sequences of observations, examples including DNA sequences and amino acid sequences of proteins. The scale and quality of the data give promises of answering various biologically relevant questions in more detail than what has been possible before. For example, one may wish to identify areas in an amino acid sequence, which are important for the function of the corresponding protein, or investigate how characteristics on the level of DNA sequence affect the adaptation of a bacterial species to its environment. Many of the interesting questions are intimately associated with the understanding of the evolutionary relationships among the items under consideration. The aim of this work is to develop novel statistical models and computational techniques to meet with the challenge of deriving meaning from the increasing amounts of data. Our main concern is on modeling the evolutionary relationships based on the observed molecular data. We operate within a Bayesian statistical framework, which allows a probabilistic quantification of the uncertainties related to a particular solution. As the basis of our modeling approach we utilize a partition model, which is used to describe the structure of data by appropriately dividing the data items into clusters of related items. Generalizations and modifications of the partition model are developed and applied to various problems. Large-scale data sets provide also a computational challenge. The models used to describe the data must be realistic enough to capture the essential features of the current modeling task but, at the same time, simple enough to make it possible to carry out the inference in practice. The partition model fulfills these two requirements. The problem-specific features can be taken into account by modifying the prior probability distributions of the model parameters. The computational efficiency stems from the ability to integrate out the parameters of the partition model analytically, which enables the use of efficient stochastic search algorithms.