874 resultados para Distributed data access
Resumo:
SUMMARY: We present a tool designed for visualization of large-scale genetic and genomic data exemplified by results from genome-wide association studies. This software provides an integrated framework to facilitate the interpretation of SNP association studies in genomic context. Gene annotations can be retrieved from Ensembl, linkage disequilibrium data downloaded from HapMap and custom data imported in BED or WIG format. AssociationViewer integrates functionalities that enable the aggregation or intersection of data tracks. It implements an efficient cache system and allows the display of several, very large-scale genomic datasets. AVAILABILITY: The Java code for AssociationViewer is distributed under the GNU General Public Licence and has been tested on Microsoft Windows XP, MacOSX and GNU/Linux operating systems. It is available from the SourceForge repository. This also includes Java webstart, documentation and example datafiles.
Resumo:
This paper explores the possibility of using data from social bookmarking services to measure the use of information by academic researchers. Social bookmarking data can be used to augment participative methods (e.g. interviews and surveys) and other, non-participative methods (e.g. citation analysis and transaction logs) to measure the use of scholarly information. We use BibSonomy, a free resource-sharing system, as a case study. Results show that published journal articles are by far the most popular type of source bookmarked, followed by conference proceedings and books. Commercial journal publisher platforms are the most popular type of information resource bookmarked, followed by websites, records in databases and digital repositories. Usage of open access information resources is low in comparison with toll access journals. In the case of open access repositories, there is a marked preference for the use of subject-based repositories over institutional repositories. The results are consistent with those observed in related studies based on surveys and citation analysis, confirming the possible use of bookmarking data in studies of information behaviour in academic settings. The main advantages of using social bookmarking data are that is an unobtrusive approach, it captures the reading habits of researchers who are not necessarily authors, and data are readily available. The main limitation is that a significant amount of human resources is required in cleaning and standardizing the data.
Resumo:
Introduction. The DRIVER I project drew up a detailed report of European repositories based on data gathered in a survey in which Spain's participation was very low. This created a highly distorted image of the implementation of repositories in Spain. This study aims to analyse the current state of Spanish open-access institutional repositories and to describe their characteristics. Method. The data were gathered through a Web survey. The questionnaire was based on that used by DRIVER I: coverage; technical infrastructure and technical issues; institutional policies; services created; and stimulators and inhibitors for establishing, filling and maintaining their digital institutional repositories. Analysis. Data were tabulated and analysed systematically according responses obtained from the questionnaire and grouped by coverage. Results. Responses were obtained from 38 of the 104 institutions contacted, which had 29 institutional repositories. This represents 78.3% of the Spanish repositories according to the BuscaRepositorios directory. Spanish repositories contained mainly full-text materials (journal articles and doctoral theses) together with metadata. The software most used was DSpace, followed by EPrints. The metadata standard most used was Dublin Core. Spanish repositories offered more usage statistics and fewer author-oriented services than the European average. The priorities for the future development of the repositories are the need for clear policies on access to scientific production based on public funding and the need for quality control indicators. Conclusions.This is the first detailed study of Spanish institutional repositories. The key stimulants for establishing, filling and maintaining were, in order of importance, the increase of visibility and citation, the interest of decision-makers, simplicity of use and search services. On the other hand the main inhibitors identified were the absence of policies, the lack of integration with other national and international systems and the lack of awareness efforts among academia.
Resumo:
Prediction of species' distributions is central to diverse applications in ecology, evolution and conservation science. There is increasing electronic access to vast sets of occurrence records in museums and herbaria, yet little effective guidance on how best to use this information in the context of numerous approaches for modelling distributions. To meet this need, we compared 16 modelling methods over 226 species from 6 regions of the world, creating the most comprehensive set of model comparisons to date. We used presence-only data to fit models, and independent presence-absence data to evaluate the predictions. Along with well-established modelling methods such as generalised additive models and GARP and BIOCLIM, we explored methods that either have been developed recently or have rarely been applied to modelling species' distributions. These include machine-learning methods and community models, both of which have features that may make them particularly well suited to noisy or sparse information, as is typical of species' occurrence data. Presence-only data were effective for modelling species' distributions for many species and regions. The novel methods consistently outperformed more established methods. The results of our analysis are promising for the use of data from museums and herbaria, especially as methods suited to the noise inherent in such data improve.
Resumo:
It is widely known that informal contacts and networks constitute a major advantage when searching for a job. Unemployed people are likely to benefit from such informal contacts, but building and sustaining a network can be particularly difficult when out of employment. Interventions that allow unemployed people to effectively strengthen their networking capability could as a result be promising. Against this background, this article provides some hints in relation to the direction that such interventions could take. First, on the basis of data collected on a sample of 4,600 newly-unemployed people in the Swiss Canton of Vaud, it looks at the factors that influence jobseekers' decisions to turn to informal contacts for their job search. The article shows that many unemployed people are not making use of their network because they are unaware of the importance of this method. Second, it presents an impact analysis of an innovative intervention designed to raise awareness of the importance of networks which is tested in a randomized controlled trial setting.
Resumo:
SUMMARY: ExpressionView is an R package that provides an interactive graphical environment to explore transcription modules identified in gene expression data. A sophisticated ordering algorithm is used to present the modules with the expression in a visually appealing layout that provides an intuitive summary of the results. From this overview, the user can select individual modules and access biologically relevant metadata associated with them. AVAILABILITY: http://www.unil.ch/cbg/ExpressionView. Screenshots, tutorials and sample data sets can be found on the ExpressionView web site.
Resumo:
The coverage and volume of geo-referenced datasets are extensive and incessantly¦growing. The systematic capture of geo-referenced information generates large volumes¦of spatio-temporal data to be analyzed. Clustering and visualization play a key¦role in the exploratory data analysis and the extraction of knowledge embedded in¦these data. However, new challenges in visualization and clustering are posed when¦dealing with the special characteristics of this data. For instance, its complex structures,¦large quantity of samples, variables involved in a temporal context, high dimensionality¦and large variability in cluster shapes.¦The central aim of my thesis is to propose new algorithms and methodologies for¦clustering and visualization, in order to assist the knowledge extraction from spatiotemporal¦geo-referenced data, thus improving making decision processes.¦I present two original algorithms, one for clustering: the Fuzzy Growing Hierarchical¦Self-Organizing Networks (FGHSON), and the second for exploratory visual data analysis:¦the Tree-structured Self-organizing Maps Component Planes. In addition, I present¦methodologies that combined with FGHSON and the Tree-structured SOM Component¦Planes allow the integration of space and time seamlessly and simultaneously in¦order to extract knowledge embedded in a temporal context.¦The originality of the FGHSON lies in its capability to reflect the underlying structure¦of a dataset in a hierarchical fuzzy way. A hierarchical fuzzy representation of¦clusters is crucial when data include complex structures with large variability of cluster¦shapes, variances, densities and number of clusters. The most important characteristics¦of the FGHSON include: (1) It does not require an a-priori setup of the number¦of clusters. (2) The algorithm executes several self-organizing processes in parallel.¦Hence, when dealing with large datasets the processes can be distributed reducing the¦computational cost. (3) Only three parameters are necessary to set up the algorithm.¦In the case of the Tree-structured SOM Component Planes, the novelty of this algorithm¦lies in its ability to create a structure that allows the visual exploratory data analysis¦of large high-dimensional datasets. This algorithm creates a hierarchical structure¦of Self-Organizing Map Component Planes, arranging similar variables' projections in¦the same branches of the tree. Hence, similarities on variables' behavior can be easily¦detected (e.g. local correlations, maximal and minimal values and outliers).¦Both FGHSON and the Tree-structured SOM Component Planes were applied in¦several agroecological problems proving to be very efficient in the exploratory analysis¦and clustering of spatio-temporal datasets.¦In this thesis I also tested three soft competitive learning algorithms. Two of them¦well-known non supervised soft competitive algorithms, namely the Self-Organizing¦Maps (SOMs) and the Growing Hierarchical Self-Organizing Maps (GHSOMs); and the¦third was our original contribution, the FGHSON. Although the algorithms presented¦here have been used in several areas, to my knowledge there is not any work applying¦and comparing the performance of those techniques when dealing with spatiotemporal¦geospatial data, as it is presented in this thesis.¦I propose original methodologies to explore spatio-temporal geo-referenced datasets¦through time. Our approach uses time windows to capture temporal similarities and¦variations by using the FGHSON clustering algorithm. The developed methodologies¦are used in two case studies. In the first, the objective was to find similar agroecozones¦through time and in the second one it was to find similar environmental patterns¦shifted in time.¦Several results presented in this thesis have led to new contributions to agroecological¦knowledge, for instance, in sugar cane, and blackberry production.¦Finally, in the framework of this thesis we developed several software tools: (1)¦a Matlab toolbox that implements the FGHSON algorithm, and (2) a program called¦BIS (Bio-inspired Identification of Similar agroecozones) an interactive graphical user¦interface tool which integrates the FGHSON algorithm with Google Earth in order to¦show zones with similar agroecological characteristics.
Resumo:
We examined sequence variation in the mitochondrial cytochrome b gene (1140 bp, n = 73) and control region (842-851 bp, n = 74) in the Eurasian harvest mouse (Micromys minutus (Pallas, 1771)), with samples drawn from across its range, from Western Europe to Japan. Phylogeographic analyses revealed region-specific haplotype groupings combined with overall low levels of inter-regional genetic divergence. Despite the enormous intervening distance, European and East Asian samples showed a net nucleotide divergence of only 0.36%. Based on an evolutionary rate for the cytochrome b gene of 2.4%(.)(site(.)lineage(.)million years)(-1), the initial divergence time of these populations is estimated at around 80 000 years before present. Our findings are consistent with available fossil evidence that has recorded repeated cycles of extinction and recolonization of Europe by M. minutus through the Quaternary. The molecular data further suggest that recolonization occurred from refugia in the Central to East Asian region. Japanese haplotypes of M. minutus, with the exception of those from Tsushima Is., show limited nucleotide diversity (0.15%) compared with those found on the adjacent Korean Peninsula. This finding suggests recent colonization of the Japanese Archipelago, probably around the last glacial period, followed by rapid population growth.
Resumo:
Selostus: Väkirehuruokinnan vaikutus maidontuotantoon karjantarkkailutiloilta kerätyssä kenttäaineistossa
Resumo:
Summary
Resumo:
The emergence of powerful new technologies, the existence of large quantities of data, and increasing demands for the extraction of added value from these technologies and data have created a number of significant challenges for those charged with both corporate and information technology management. The possibilities are great, the expectations high, and the risks significant. Organisations seeking to employ cloud technologies and exploit the value of the data to which they have access, be this in the form of "Big Data" available from different external sources or data held within the organisation, in structured or unstructured formats, need to understand the risks involved in such activities. Data owners have responsibilities towards the subjects of the data and must also, frequently, demonstrate that they are in compliance with current standards, laws and regulations. This thesis sets out to explore the nature of the technologies that organisations might utilise, identify the most pertinent constraints and risks, and propose a framework for the management of data from discovery to external hosting that will allow the most significant risks to be managed through the definition, implementation, and performance of appropriate internal control activities.
Resumo:
This thesis examines how oversight bodies, as part of an ATI policy, contribute to the achievement of the policy's objectives. The aim of the thesis is to see how oversight bodies and the work they do affects the implementation of their respective ATI policies and thereby contributes to the objectives of those policies using a comparative case study approach. The thesis investigates how federal/central government level information commissioners in four jurisdictions - Germany, India, Scotland, and Switzerland - enforce their respective ATI policies, which tasks they carry out in addition to their enforcement duties, the challenges they face in their work and the ways they overcome these. Qualitative data were gathered from primary and secondary documents as well as in 37 semi-structured interviews with staff of the commissioners' offices, administrative officials whose job entails complying with ATI, people who have made ATI requests and appealed to their respective oversight body, and external experts who have studied ATI implementation in their particular jurisdiction. The thesis finds that while the aspect of an oversight body's formal independence that has the greatest impact on its work is resource control and that although the powers granted by law set the framework for ensuring that the administration is properly complying with the policy, the commissioner's leadership style - a component of informal independence - has more influence than formal attributes of independence in setting out how resources are obtained and used as well as how staff set priorities and utilize the powers they are granted by law. The conclusion, therefore, is that an ATI oversight body's ability to contribute to the achievement of the policy's objectives is a function of three main factors: a. commissioner's leadership style; b. adequacy of resources and degree of control the organization has over them; c. powers and the exercise of discretion in using them. In effect, the thesis argues that it is difficult to pinpoint the value of the formal powers set out for the oversight body in the ATI law, and that their decisions on whether and how to use them are more important than the presumed strength of the powers. It also claims that the choices made by the commissioners and their staff regarding priorities and use of powers are determined to a large extent by the adequacy of resources and the degree of control the organization has over those resources. In turn, how the head of the organization leads and manages the oversight body is crucial to both the adequacy of the organization's resources and the decisions made about the use of powers. Together, these three factors have a significant impact on the body's effectiveness in contributing to ATI objectives.