22 resultados para Search engine results page
em Helda - Digital Repository of University of Helsinki
Resumo:
Introduction. We estimate the total yearly volume of peer-reviewed scientific journal articles published world-wide as well as the share of these articles available openly on the Web either directly or as copies in e-print repositories. Method. We rely on data from two commercial databases (ISI and Ulrich's Periodicals Directory) supplemented by sampling and Google searches. Analysis. A central issue is the finding that ISI-indexed journals publish far more articles per year (111) than non ISI-indexed journals (26), which means that the total figure we obtain is much lower than many earlier estimates. Our method of analysing the number of repository copies (green open access) differs from several earlier studies which have studied the number of copies in identified repositories, since we start from a random sample of articles and then test if copies can be found by a Web search engine. Results. We estimate that in 2006 the total number of articles published was approximately 1,350,000. Of this number 4.6% became immediately openly available and an additional 3.5% after an embargo period of, typically, one year. Furthermore, usable copies of 11.3% could be found in subject-specific or institutional repositories or on the home pages of the authors. Conclusions. We believe our results are the most reliable so far published and, therefore, should be useful in the on-going debate about Open Access among both academics and science policy makers. The method is replicable and also lends itself to longitudinal studies in the future.
Resumo:
Background: The Internet has recently made possible the free global availability of scientific journal articles. Open Access (OA) can occur either via OA scientific journals, or via authors posting manuscripts of articles published in subscription journals in open web repositories. So far there have been few systematic studies showing how big the extent of OA is, in particular studies covering all fields of science. Methodology/Principal Findings: The proportion of peer reviewed scholarly journal articles, which are available openly in full text on the web, was studied using a random sample of 1837 titles and a web search engine. Of articles published in 2008, 8,5% were freely available at the publishers’ sites. For an additional 11,9% free manuscript versions could be found using search engines, making the overall OA percentage 20,4%. Chemistry (13%) had the lowest overall share of OA, Earth Sciences (33%) the highest. In medicine, biochemistry and chemistry publishing in OA journals was more common. In all other fields author-posted manuscript copies dominated the picture. Conclusions/Significance: The results show that OA already has a significant positive impact on the availability of the scientific journal literature and that there are big differences between scientific disciplines in the uptake. Due to the lack of awareness of OA-publishing among scientists in most fields outside physics, the results should be of general interest to all scholars. The results should also interest academic publishers, who need to take into account OA in their business strategies and copyright policies, as well as research funders, who like the NIH are starting to require OA availability of results from research projects they fund. The method and search tools developed also offer a good basis for more in-depth studies as well as longitudinal studies.
Resumo:
Human growth and attained height are determined by a combination of genetic and environmental effects and in modern Western societies > 80% of the observed variation in height is determined by genetic factors. Height is a fundamental human trait that is associated with many socioeconomic and psychosocial factors and health measures, however little is known of the identity of the specific genes that influence height variation in the general population. This thesis work aimed to identify the genetic variants that influence height in the general population by genome-wide linkage analysis utilizing large family samples. The study focused on analysis of three separate sets of families consisting of: 1) 1,417 individuals from 277 Finnish families (FinnHeight), 2) 8,450 individuals from 3,817 families from Australia and Europe (EUHeight) and 3) 9,306 individuals from 3,302 families from the United States (USHeight). The most significant finding in this study was found in the Finnish family sample where we a locus in the chromosomal region 1p21 was linked to adult height. Several regions showed evidence for linkage in the Australian, European and US families with 8q21 and 15q25 being the most significant. The region on 1p21 was followed up with further studies and we were able to show that the collagen 11-alpha-1 gene (COL11A1) residing at this location was associated with adult height. This association was also confirmed in an independent Finnish population cohort (Health 2000) consisting of 6,542 individuals. From this population sample, we estimated that homozygous males and females for this gene variant were 1.1 and 0.6 cm taller than the respective controls. In this thesis work we identified a gene variant in the COL11A1 gene that influences human height, although this variant alone explains only 0.1% of height variation in the Finnish population. We also demonstrated in this study that special stratification strategies such as performing sex-limited analyses, focusing on dizygous twin pairs, analyzing ethnic groups within a population separately and utilizing homogenous populations such as the Finns can improve the statistical power of finding QTL significantly. Also, we concluded from the results of this study that even though genetic effects explain a great proportion of height variance, it is likely that there are tens or even hundreds of genes with small individual effects underlying the genetic architecture of height.
Resumo:
Schizophrenia is a severe mental disorder affecting 0.4-1% of the population worldwide. It is characterized by impairments in the perception of reality and by significant social or occupational dysfunction. The disorder is one of the major contributors to the global burden of diseases. Studies of twins, families, and adopted children point to strong genetic components for schizophrenia, but environmental factors also play a role in the pathogenesis of disease. Molecular genetic studies have identified several potential positional candidate genes. The strongest evidence for putative schizophrenia susceptibility loci relates to the genes encoding dysbindin (DTNBP1) and neuregulin (NRG1), but studies lack impressive consistency in the precise genetic regions and alleles implicated. We have studied the role of three potential candidate genes by genotyping 28 single nucleotide polymorphisms in the DNTBP1, NRG1, and AKT1 genes in a large schizophrenia family sample consisting of 441 families with 865 affected individuals from Finland. Our results do not support a major role for these genes in the pathogenesis of schizophrenia in Finland. We have previously identified a region on chromosome 5q21-34 as a susceptibility locus for schizophrenia in a Finnish family sample. Recently, two studies reported association between the γ-aminobutyric acid type A receptor cluster of genes in this region and one study showed suggestive evidence for association with another regional gene encoding clathrin interactor 1 (CLINT1, also called Epsin 4 and ENTH). To further address the significance of these genes under the linkage peak in the Finnish families, we genotyped SNPs of these genes, and observed statistically significant association of variants between GABRG2 and schizophrenia. Furthermore, these variants also seem to affect the functioning of the working memory. Fetal events and obstetric complications are associated with schizophrenia. Rh incompatibility has been implicated as a risk factor for schizophrenia in several epidemiological studies. We conducted a family-based candidate-gene study that assessed the role of maternal-fetal genotype incompatibility at the RhD locus in schizophrenia. There was significant evidence for an RhD maternal-fetal genotype incompatibility, and the risk ratio was estimated at 2.3. This is the first candidate-gene study to explicitly test for and provide evidence of a maternal-fetal genotype incompatibility mechanism in schizophrenia. In conclusion, in this thesis we found evidence that one GABA receptor subunit, GABRG2, is significantly associated with schizophrenia. Furthermore, it also seems to affect to the functioning of the working memory. In addition, an RhD maternal-fetal genotype incompatibility increases the risk of schizophrenia by two-fold.
Resumo:
Buffer zones are vegetated strip-edges of agricultural fields along watercourses. As linear habitats in agricultural ecosystems, buffer strips dominate and play a leading ecological role in many areas. This thesis focuses on the plant species diversity of the buffer zones in a Finnish agricultural landscape. The main objective of the present study is to identify the determinants of floral species diversity in arable buffer zones from local to regional levels. This study was conducted in a watershed area of a farmland landscape of southern Finland. The study area, Lepsämänjoki, is situated in the Nurmijärvi commune 30 km to the north of Helsinki, Finland. The biotope mosaics were mapped in GIS. A total of 59 buffer zones were surveyed, of which 29 buffer strips surveyed were also sampled by plot. Firstly, two diversity components (species richness and evenness) were investigated to determine whether the relationship between the two is equal and predictable. I found no correlation between species richness and evenness. The relationship between richness and evenness is unpredictable in a small-scale human-shaped ecosystem. Ordination and correlation analyses show that richness and evenness may result from different ecological processes, and thus should be considered separately. Species richness correlated negatively with phosphorus content, and species evenness correlated negatively with the ratio of organic carbon to total nitrogen in soil. The lack of a consistent pattern in the relationship between these two components may be due to site-specific variation in resource utilization by plant species. Within-habitat configuration (width, length, and area) were investigated to determine which is more effective for predicting species richness. More species per unit area increment could be obtained from widening the buffer strip than from lengthening it. The width of the strips is an effective determinant of plant species richness. The increase in species diversity with an increase in the width of buffer strips may be due to cross-sectional habitat gradients within the linear patches. This result can serve as a reference for policy makers, and has application value in agricultural management. In the framework of metacommunity theory, I found that both mass effect(connectivity) and species sorting (resource heterogeneity) were likely to explain species composition and diversity on a local and regional scale. The local and regional processes were interactively dominated by the degree to which dispersal perturbs local communities. In the lowly and intermediately connected regions, species sorting was of primary importance to explain species diversity, while the mass effect surpassed species sorting in the highly connected region. Increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities, and consequently, to lower regional diversity, while local species richness was unrelated to the habitat connectivity. Of all species found, Anthriscus sylvestris, Phalaris arundinacea, and Phleum pretense significantly responded to connectivity, and showed high abundance in the highly connected region. We suggest that these species may play a role in switching the force from local resources to regional connectivity shaping the community structure. On the landscape context level, the different responses of local species richness and evenness to landscape context were investigated. Seven landscape structural parameters served to indicate landscape context on five scales. On all scales but the smallest scales, the Shannon-Wiener diversity of land covers (H') correlated positively with the local richness. The factor (H') showed the highest correlation coefficients in species richness on the second largest scale. The edge density of arable field was the only predictor that correlated with species evenness on all scales, which showed the highest predictive power on the second smallest scale. The different predictive power of the factors on different scales showed a scaledependent relationship between the landscape context and local plant species diversity, and indicated that different ecological processes determine species richness and evenness. The local richness of species depends on a regional process on large scales, which may relate to the regional species pool, while species evenness depends on a fine- or coarse-grained farming system, which may relate to the patch quality of the habitats of field edges near the buffer strips. My results suggested some guidelines of species diversity conservation in the agricultural ecosystem. To maintain a high level of species diversity in the strips, a high level of phosphorus in strip soil should be avoided. Widening the strips is the most effective mean to improve species richness. Habitat connectivity is not always favorable to species diversity because increasing connectivity in communities containing high habitat heterogeneity can lead to the homogenization of local communities (beta diversity) and, consequently, to lower regional diversity. Overall, a synthesis of local and regional factors emerged as the model that best explain variations in plant species diversity. The studies also suggest that the effects of determinants on species diversity have a complex relationship with scale.
Resumo:
XML documents are becoming more and more common in various environments. In particular, enterprise-scale document management is commonly centred around XML, and desktop applications as well as online document collections are soon to follow. The growing number of XML documents increases the importance of appropriate indexing methods and search tools in keeping the information accessible. Therefore, we focus on content that is stored in XML format as we develop such indexing methods. Because XML is used for different kinds of content ranging all the way from records of data fields to narrative full-texts, the methods for Information Retrieval are facing a new challenge in identifying which content is subject to data queries and which should be indexed for full-text search. In response to this challenge, we analyse the relation of character content and XML tags in XML documents in order to separate the full-text from data. As a result, we are able to both reduce the size of the index by 5-6\% and improve the retrieval precision as we select the XML fragments to be indexed. Besides being challenging, XML comes with many unexplored opportunities which are not paid much attention in the literature. For example, authors often tag the content they want to emphasise by using a typeface that stands out. The tagged content constitutes phrases that are descriptive of the content and useful for full-text search. They are simple to detect in XML documents, but also possible to confuse with other inline-level text. Nonetheless, the search results seem to improve when the detected phrases are given additional weight in the index. Similar improvements are reported when related content is associated with the indexed full-text including titles, captions, and references. Experimental results show that for certain types of document collections, at least, the proposed methods help us find the relevant answers. Even when we know nothing about the document structure but the XML syntax, we are able to take advantage of the XML structure when the content is indexed for full-text search.
Resumo:
Analyzing statistical dependencies is a fundamental problem in all empirical science. Dependencies help us understand causes and effects, create new scientific theories, and invent cures to problems. Nowadays, large amounts of data is available, but efficient computational tools for analyzing the data are missing. In this research, we develop efficient algorithms for a commonly occurring search problem - searching for the statistically most significant dependency rules in binary data. We consider dependency rules of the form X->A or X->not A, where X is a set of positive-valued attributes and A is a single attribute. Such rules describe which factors either increase or decrease the probability of the consequent A. A classical example are genetic and environmental factors, which can either cause or prevent a disease. The emphasis in this research is that the discovered dependencies should be genuine - i.e. they should also hold in future data. This is an important distinction from the traditional association rules, which - in spite of their name and a similar appearance to dependency rules - do not necessarily represent statistical dependencies at all or represent only spurious connections, which occur by chance. Therefore, the principal objective is to search for the rules with statistical significance measures. Another important objective is to search for only non-redundant rules, which express the real causes of dependence, without any occasional extra factors. The extra factors do not add any new information on the dependence, but can only blur it and make it less accurate in future data. The problem is computationally very demanding, because the number of all possible rules increases exponentially with the number of attributes. In addition, neither the statistical dependency nor the statistical significance are monotonic properties, which means that the traditional pruning techniques do not work. As a solution, we first derive the mathematical basis for pruning the search space with any well-behaving statistical significance measures. The mathematical theory is complemented by a new algorithmic invention, which enables an efficient search without any heuristic restrictions. The resulting algorithm can be used to search for both positive and negative dependencies with any commonly used statistical measures, like Fisher's exact test, the chi-squared measure, mutual information, and z scores. According to our experiments, the algorithm is well-scalable, especially with Fisher's exact test. It can easily handle even the densest data sets with 10000-20000 attributes. Still, the results are globally optimal, which is a remarkable improvement over the existing solutions. In practice, this means that the user does not have to worry whether the dependencies hold in future data or if the data still contains better, but undiscovered dependencies.
Resumo:
Current smartphones have a storage capacity of several gigabytes. More and more information is stored on mobile devices. To meet the challenge of information organization, we turn to desktop search. Users often possess multiple devices, and synchronize (subsets of) information between them. This makes file synchronization more important. This thesis presents Dessy, a desktop search and synchronization framework for mobile devices. Dessy uses desktop search techniques, such as indexing, query and index term stemming, and search relevance ranking. Dessy finds files by their content, metadata, and context information. For example, PDF files may be found by their author, subject, title, or text. EXIF data of JPEG files may be used in finding them. User–defined tags can be added to files to organize and retrieve them later. Retrieved files are ranked according to their relevance to the search query. The Dessy prototype uses the BM25 ranking function, used widely in information retrieval. Dessy provides an interface for locating files for both users and applications. Dessy is closely integrated with the Syxaw file synchronizer, which provides efficient file and metadata synchronization, optimizing network usage. Dessy supports synchronization of search results, individual files, and directory trees. It allows finding and synchronizing files that reside on remote computers, or the Internet. Dessy is designed to solve the problem of efficient mobile desktop search and synchronization, also supporting remote and Internet search. Remote searches may be carried out offline using a downloaded index, or while connected to the remote machine on a weak network. To secure user data, transmissions between the Dessy client and server are encrypted using symmetric encryption. Symmetric encryption keys are exchanged with RSA key exchange. Dessy emphasizes extensibility. Also the cryptography can be extended. Users may tag their files with context tags and control custom file metadata. Adding new indexed file types, metadata fields, ranking methods, and index types is easy. Finding files is done with virtual directories, which are views into the user’s files, browseable by regular file managers. On mobile devices, the Dessy GUI provides easy access to the search and synchronization system. This thesis includes results of Dessy synchronization and search experiments, including power usage measurements. Finally, Dessy has been designed with mobility and device constraints in mind. It requires only MIDP 2.0 Mobile Java with FileConnection support, and Java 1.5 on desktop machines.
Resumo:
A large fraction of an XML document typically consists of text data. The XPath query language allows text search via the equal, contains, and starts-with predicates. Such predicates can be efficiently implemented using a compressed self-index of the document's text nodes. Most queries, however, contain some parts querying the text of the document, plus some parts querying the tree structure. It is therefore a challenge to choose an appropriate evaluation order for a given query, which optimally leverages the execution speeds of the text and tree indexes. Here the SXSI system is introduced. It stores the tree structure of an XML document using a bit array of opening and closing brackets plus a sequence of labels, and stores the text nodes of the document using a global compressed self-index. On top of these indexes sits an XPath query engine that is based on tree automata. The engine uses fast counting queries of the text index in order to dynamically determine whether to evaluate top-down or bottom-up with respect to the tree structure. The resulting system has several advantages over existing systems: (1) on pure tree queries (without text search) such as the XPathMark queries, the SXSI system performs on par or better than the fastest known systems MonetDB and Qizx, (2) on queries that use text search, SXSI outperforms the existing systems by 1-3 orders of magnitude (depending on the size of the result set), and (3) with respect to memory consumption, SXSI outperforms all other systems for counting-only queries.
Resumo:
We present the results of a search for Higgs bosons predicted in two-Higgs-doublet models, in the case where the Higgs bosons decay to tau lepton pairs, using 1.8 inverse fb of integrated luminosity of proton-antiproton collisions recorded by the CDF II experiment at the Fermilab Tevatron. Studying the observed mass distribution in events where one or both tau leptons decay leptonically, no evidence for a Higgs boson signal is observed. The result is used to infer exclusion limits in the two-dimensional parameter space of tan beta versus m(A).
Resumo:
We present the results of a search for Higgs bosons predicted in two-Higgs-doublet models, in the case where the Higgs bosons decay to tau lepton pairs, using 1.8 inverse fb of integrated luminosity of proton-antiproton collisions recorded by the CDF II experiment at the Fermilab Tevatron. Studying the observed mass distribution in events where one or both tau leptons decay leptonically, no evidence for a Higgs boson signal is observed. The result is used to infer exclusion limits in the two-dimensional parameter space of tan beta versus m(A).
Resumo:
A search for new physics using three-lepton (trilepton) data collected with the CDF II detector and corresponding to an integrated luminosity of 976 pb-1 is presented. The standard model predicts a low rate of trilepton events, which makes some supersymmetric processes, such as chargino-neutralino production, measurable in this channel. The mu+mu+l signature is investigated, where l is an electron or a muon, with the additional requirement of large missing transverse energy. In this analysis, the lepton transverse momenta with respect to the beam direction (pT) are as low as 5 GeV/c, a selection that improves the sensitivity to particles which are light as well as to ones which result in leptonically decaying tau leptons. At the same time, this low-p_T selection presents additional challenges due to the non-negligible heavy-quark background at low lepton momenta. This background is measured with an innovative technique using experimental data. Several dimuon and trilepton control regions are investigated, and good agreement between experimental results and standard-model predictions is observed. In the signal region, we observe one three-muon event and expect 0.4+/-0.1 mu+mu+l events
Resumo:
We search for b to s\mu^+\mu^- transitions in B meson (B^+, B^0, or B^0_s) decays with 924pb^{-1} of p pbar collisions at sqrt(s)=1.96 TeV collected with the CDF II detector at the Fermilab Tevatron. We find excesses with significances of 4.5, 2.9, and 2.4 standard deviations in the B^+ to \mu^+\mu^-K^+, B^0 to \mu^+\mu^-K^*(892)^0, and B_s^0 to \mu^+\mu^-\phi decay modes, respectively. Using B to J/psi h (h = K^+, K^*(892)^0, phi) decays as normalization channels, we report branching fractions for the previously observed B^+ and B^0 decays, BR(B^+ to \mu^+\mu^-K^+)=(0.59\pm0.15\pm0.04) x 10^{-6}, and BR(B^0 to \mu^+\mu^-K^*(892)^0)=(0.81\pm0.30\pm0.10) x 10^{-6}, where the first uncertainty is statistical, and the second is systematic. These measurements are consistent with the world average results, and are competitive with the best available measurements. We set an upper limit on the relative branching fraction BR(B_s^0 to \mu^+\mu^-\phi)/BR(B_s^0 to J/\psi\phi)
Resumo:
We present the results of a search for pair production of the supersymmetric partner of the top quark (the stop quark $\tilde{t}_{1}$) decaying to a $b$-quark and a chargino $\chargino$ with a subsequent $\chargino$ decay into a neutralino $\neutralino$, lepton $\ell$, and neutrino $\nu$. Using a data sample corresponding to 2.7 fb$^{-1}$ of integrated luminosity of $p\bar{p}$ collisions at $\sqrt{s} = 1.96$ TeV collected by the CDF II detector, we reconstruct the mass of candidate stop events and fit the observed mass spectrum to a combination of standard model processes and stop quark signal. We find no evidence for $\pairstop$ production and set 95% C.L. limits on the masses of the stop quark and the neutralino for several values of the chargino mass and the branching ratio ${\cal B}(\chargino\to\neutralino\ell^{\pm}\nu)$.
Resumo:
We present the results of a search for supersymmetry with gauge-mediated breaking and $\NONE\to\gamma\Gravitino$ in the $\gamma\gamma$+missing transverse energy final state. In 2.6$\pm$0.2 \invfb of $p{\bar p}$ collisions at $\sqrt{s}$$=$1.96 TeV recorded by the CDF II detector we observe no candidate events, consistent with a standard model background expectation of 1.4$\pm$0.4 events. We set limits on the cross section at the 95% C.L. and place the world's best limit of 149\gevc on the \none mass at $\tau_{\tilde{\chi}_1^0}$$