970 resultados para Datasets
Resumo:
[ES]El objetivo principal de esta tesis de máster es el estudio del comportamiento térmico del instrumento TriboLAB durante su estancia en la Estación Espacial Internacional, junto con la comparación de dicho comportamiento con el pronosticado por los modelos térmicos matemáticos empleados en el diseño de su sistema de control térmico. El trabajo realizado ha permitido profundizar de forma importante en el conocimiento del mencionado comportamiento. Ello permitirá poner a disposición de otros experimentadores interesados en ubicar sus instrumentos en los balcones exteriores de la Estación Espacial Internacional, información real acerca del comportamiento térmico de un equipo de las características del TriboLAB en dichas condiciones. Información de gran interés para ser empleada en el diseño del control térmico de sus instrumentos, especialmente ahora que la vida útil de la Estación Espacial Internacional ha sido prorrogada hasta 2020. El control térmico de los equipos espaciales es un aspecto clave para asegurar su supervivencia y correcto funcionamiento bajo las extremas condiciones existentes en el espacio. Su misión es mantener los distintos componentes dentro de su rango de temperaturas admisibles, puesto que en caso contrario no podrían funcionar o incluso ni siquiera sobrevivir más allá de esas temperaturas. Adicionalmente ha sido posible comprobar la aplicabilidad de distintas técnicas de análisis de datos funcionales en lo que respecta al estudio del tipo de datos aquí contemplado. Así mismo, se han comparado los resultados de la campaña de ensayos térmicos con los modelos térmicos matemáticos que han guiado el diseño del control térmico, y que son una pieza fundamental en el diseño del control térmico de cualquier instrumento espacial. Ello ha permitido verificar tanto la validez del sistema de control térmico diseñado para el TriboLAB como con la adecuada similitud existente entre los resultados de los modelos térmicos matemáticos y las temperaturas registradas en el equipo. Todo ello, ha sido realizado desde la perspectiva del análisis de datos funcionales.
Resumo:
[ES] Como parte de este proyecto de investigación se realizó el siguiente proyecto fin de carrera:
Resumo:
This report was prepared for and funded by the Florida State Department of Environmental Protection with the encouragement of members from the Florida Ocean Alliance, Florida Oceans and Coastal Resources Council and other groups with deep interests in the future of Florida’s coast. It is a preliminary study of Florida’s Ocean and Coastal Economies based only on information currently found within the datasets of the National Ocean Economics Program. (NOEP). It reflects only a portion of the value of Florida’s coastal related economy and should not be considered comprehensive. A more customized study based on the unique coastal and ocean-dependent economic activities of the State of Florida should be carried out to complete the picture of Florida’s dependence upon its coasts. (PDF has 129 pages.)
Resumo:
For sign languages used by deaf communities, linguistic corpora have until recently been unavailable, due to the lack of a writing system and a written culture in these communities, and the very recent advent of digital video. Recent improvements in video and computer technology have now made larger sign language datasets possible; however, large sign language datasets that are fully machine-readable are still elusive. This is due to two challenges. 1. Inconsistencies that arise when signs are annotated by means of spoken/written language. 2. The fact that many parts of signed interaction are not necessarily fully composed of lexical signs (equivalent of words), instead consisting of constructions that are less conventionalised. As sign language corpus building progresses, the potential for some standards in annotation is beginning to emerge. But before this project, there were no attempts to standardise these practices across corpora, which is required to be able to compare data crosslinguistically. This project thus had the following aims: 1. To develop annotation standards for glosses (lexical/word level) 2. To test their reliability and validity 3. To improve current software tools that facilitate a reliable workflow Overall the project aimed not only to set a standard for the whole field of sign language studies throughout the world but also to make significant advances toward two of the world’s largest machine-readable datasets for sign languages – specifically the BSL Corpus (British Sign Language, http://bslcorpusproject.org) and the Corpus NGT (Sign Language of the Netherlands, http://www.ru.nl/corpusngt).
Resumo:
Scientific research revolves around the production, analysis, storage, management, and re-use of data. Data sharing offers important benefits for scientific progress and advancement of knowledge. However, several limitations and barriers in the general adoption of data sharing are still in place. Probably the most important challenge is that data sharing is not yet very common among scholars and is not yet seen as a regular activity among scientists, although important efforts are being invested in promoting data sharing. In addition, there is a relatively low commitment of scholars to cite data. The most important problems and challenges regarding data metrics are closely tied to the more general problems related to data sharing. The development of data metrics is dependent on the growth of data sharing practices, after all it is nothing more than the registration of researchers’ behaviour. At the same time, the availability of proper metrics can help researchers to make their data work more visible. This may subsequently act as an incentive for more data sharing and in this way a virtuous circle may be set in motion. This report seeks to further explore the possibilities of metrics for datasets (i.e. the creation of reliable data metrics) and an effective reward system that aligns the main interests of the main stakeholders involved in the process. The report reviews the current literature on data sharing and data metrics. It presents interviews with the main stakeholders on data sharing and data metrics. It also analyses the existing repositories and tools in the field of data sharing that have special relevance for the promotion and development of data metrics. On the basis of these three pillars, the report presents a number of solutions and necessary developments, as well as a set of recommendations regarding data metrics. The most important recommendations include the general adoption of data sharing and data publication among scholars; the development of a reward system for scientists that includes data metrics; reducing the costs of data publication; reducing existing negative cultural perceptions of researchers regarding data publication; developing standards for preservation, publication, identification and citation of datasets; more coordination of data repository initiatives; and further development of interoperability protocols across different actors.
Resumo:
Authority files serve to uniquely identify real world ‘things’ or entities like documents, persons, organisations, and their properties, like relations and features. Already important in the classical library world, authority files are indispensable for adequate information retrieval and analysis in the computer age. This is because, even more than humans, computers are poor at handling ambiguity. Through authority files, people tell computers which terms, names or numbers refer to the same thing or have the same meaning by giving equivalent notions the same identifier. Thus, authority files signpost the internet where these identifiers are interlinked on the basis of relevance. When executing a query, computers are able to navigate from identifier to identifier by following these links and collect the queried information on these so-called ‘crosswalks’. In this context, identifiers also go under the name controlled access points. Identifiers become even more crucial now massive data collections like library catalogues or research datasets are releasing their till-now contained data directly to the internet. This development is coined Open Linked Data. The concatenating name for the internet is Web of Data instead of the classical Web of Documents.
Resumo:
The possibilities of digital research have altered the production, publication and use of research results. Academic research practice and culture are changing or have already been transformed, but to a large degree the system of academic recognition has not yet adapted to the practices and possibilities of digital research. This applies especially to research data, which are increasingly produced, managed, published and archived, but play hardly a role yet in practices of research assessment. The aim of the workshop was to bring experts and stakeholders from research institutions, universities, scholarly societies and funding agencies together in order to review, discuss and build on possibilities to implement the culture of sharing and to integrate publication of data into research assessment procedures. The report 'The Value of Research Data - Metrics for datasets from a cultural and technical point of view' was presented and discussed. Some of the key finding were that data sharing should be considered normal research practice, in fact not sharing should be considered malpractice. Research funders and universities should support and encourage data sharing. There are a number of important aspects to consider when making data count in research and evaluation procedures. Metrics are a necessary tool in monitoring the sharing of data sets. However, data metrics are at present not very well developed and there is not yet enough experience in what these metrics actually mean. It is important to implement the culture of sharing through codes of conducts in the scientific communities. For further key findings please read the report.
Resumo:
[EN] This paper is an outcome of the ERASMUS IP program called TOPCART, there are more information about this project that can be accessed from the following item:
Resumo:
[EN]Measuring semantic similarity and relatedness between textual items (words, sentences, paragraphs or even documents) is a very important research area in Natural Language Processing (NLP). In fact, it has many practical applications in other NLP tasks. For instance, Word Sense Disambiguation, Textual Entailment, Paraphrase detection, Machine Translation, Summarization and other related tasks such as Information Retrieval or Question Answering. In this masther thesis we study di erent approaches to compute the semantic similarity between textual items. In the framework of the european PATHS project1, we also evaluate a knowledge-base method on a dataset of cultural item descriptions. Additionaly, we describe the work carried out for the Semantic Textual Similarity (STS) shared task of SemEval-2012. This work has involved supporting the creation of datasets for similarity tasks, as well as the organization of the task itself.
Resumo:
Background: Colorectal cancer (CRC) is a disease of complex aetiology, with much of the expected inherited risk being due to several common low risk variants. Genome-Wide Association Studies (GWAS) have identified 20 CRC risk variants. Nevertheless, these have only been able to explain part of the missing heritability. Moreover, these signals have only been inspected in populations of Northern European origin. Results: Thus, we followed the same approach in a Spanish cohort of 881 cases and 667 controls. Sixty-four variants at 24 loci were found to be associated with CRC at p-values <10-5. We therefore evaluated the 24 loci in another Spanish replication cohort (1481 cases and 1850 controls). Two of these SNPs, rs12080929 at 1p33 (P-replication=0.042; P-pooled=5.523x10(-03); OR (CI95%)=0.866(0.782-0.959)) and rs11987193 at 8p12 (P-replication=0.039; P-pooled=6.985x10(-5); OR (CI95%)=0.786(0.705-0.878)) were replicated in the second Phase, although they did not reach genome-wide statistical significance. Conclusions: We have performed the first CRC GWAS in a Southern European population and by these means we were able to identify two new susceptibility variants at 1p33 and 8p12 loci. These two SNPs are located near the SLC5A9 and DUSP4 loci, respectively, which could be good functional candidates for the association signals. We therefore believe that these two markers constitute good candidates for CRC susceptibility loci and should be further evaluated in other larger datasets. Moreover, we highlight that were these two SNPs true susceptibility variants, they would constitute a decrease in the CRC missing heritability fraction.
Resumo:
In spite of over a century of research on cortical circuits, it is still unknown how many classes of cortical neurons exist. Neuronal classification has been a difficult problem because it is unclear what a neuronal cell class actually is and what are the best characteristics are to define them. Recently, unsupervised classifications using cluster analysis based on morphological, physiological or molecular characteristics, when applied to selected datasets, have provided quantitative and unbiased identification of distinct neuronal subtypes. However, better and more robust classification methods are needed for increasingly complex and larger datasets. We explored the use of affinity propagation, a recently developed unsupervised classification algorithm imported from machine learning, which gives a representative example or exemplar for each cluster. As a case study, we applied affinity propagation to a test dataset of 337 interneurons belonging to four subtypes, previously identified based on morphological and physiological characteristics. We found that affinity propagation correctly classified most of the neurons in a blind, non-supervised manner. In fact, using a combined anatomical/physiological dataset, our algorithm differentiated parvalbumin from somatostatin interneurons in 49 out of 50 cases. Affinity propagation could therefore be used in future studies to validly classify neurons, as a first step to help reverse engineer neural circuits.
Resumo:
265 p.
Resumo:
Singular Value Decomposition (SVD) is a key linear algebraic operation in many scientific and engineering applications. In particular, many computational intelligence systems rely on machine learning methods involving high dimensionality datasets that have to be fast processed for real-time adaptability. In this paper we describe a practical FPGA (Field Programmable Gate Array) implementation of a SVD processor for accelerating the solution of large LSE problems. The design approach has been comprehensive, from the algorithmic refinement to the numerical analysis to the customization for an efficient hardware realization. The processing scheme rests on an adaptive vector rotation evaluator for error regularization that enhances convergence speed with no penalty on the solution accuracy. The proposed architecture, which follows a data transfer scheme, is scalable and based on the interconnection of simple rotations units, which allows for a trade-off between occupied area and processing acceleration in the final implementation. This permits the SVD processor to be implemented both on low-cost and highend FPGAs, according to the final application requirements.
Resumo:
The concept of seismogenic asperities and aseismic barriers has become a useful paradigm within which to understand the seismogenic behavior of major faults. Since asperities and barriers can be thought of as defining the potential rupture area of large megathrust earthquakes, it is thus important to identify their respective spatial extents, constrain their temporal longevity, and to develop a physical understanding for their behavior. Space geodesy is making critical contributions to the identification of slip asperities and barriers but progress in many geographical regions depends on improving the accuracy and precision of the basic measurements. This thesis begins with technical developments aimed at improving satellite radar interferometric measurements of ground deformation whereby we introduce an empirical correction algorithm for unwanted effects due to interferometric path delays that are due to spatially and temporally variable radar wave propagation speeds in the atmosphere. In chapter 2, I combine geodetic datasets with complementary spatio-temporal resolutions to improve our understanding of the spatial distribution of crustal deformation sources and their associated temporal evolution – here we use observations from Long Valley Caldera (California) as our test bed. In the third chapter I apply the tools developed in the first two chapters to analyze postseismic deformation associated with the 2010 Mw=8.8 Maule (Chile) earthquake. The result delimits patches where afterslip occurs, explores their relationship to coseismic rupture, quantifies frictional properties associated with inferred patches of afterslip, and discusses the relationship of asperities and barriers to long-term topography. The final chapter investigates interseismic deformation of the eastern Makran subduction zone by using satellite radar interferometry only, and demonstrates that with state-of-art techniques it is possible to quantify tectonic signals with small amplitude and long wavelength. Portions of the eastern Makran for which we estimate low fault coupling correspond to areas where bathymetric features on the downgoing plate are presently subducting, whereas the region of the 1945 M=8.1 earthquake appears to be more highly coupled.
Resumo:
This thesis describes the active structures of Myanmar and its surrounding regions, and the earthquake geology of the major active structures. Such investigation is needed urgently for this rapidly developing country that has suffered from destructive earthquakes in its long history. To archive a better understanding of the regional active tectonics and the seismic potential in the future, we utilized a global digital elevation model and optical satellite imagery to describe geomorphologic evidence for the principal neotectonic features of the western half of the Southeast Asia mainland. Our investigation shows three distinct active structural systems that accommodate the oblique convergence between the Indian plate and Southeast Asia and the extrusion of Asian territory around the eastern syntaxis of the Himalayan mountain range. Each of these active deformation belts can be further separated into several neotectonic domains, in which structures show distinctive active behaviors from one to another.
In order to better understand the behaviors of active structures, we focused on the active characteristics of the right-lateral Sagaing fault and the oblique subducting northern Sunda megathrust in the second part of this thesis. The detailed geomorphic investigations along these two major plate-interface faults revealed the recent slip behavior of these structures, and plausible recurrence intervals of major seismic events. We also documented the ground deformation of the 2011 Tarlay earthquake in remote eastern Myanmar from remote sensing datasets and post-earthquake field investigations. The field observation and the remote sensing measurements of surface ruptures of the Tarlay earthquake are the first study of this kind in the Myanmar region.