942 resultados para data centric research
Predicting sense of community and participation by applying machine learning to open government data
Resumo:
Community capacity is used to monitor socio-economic development. It is composed of a number of dimensions, which can be measured to understand the possible issues in the implementation of a policy or the outcome of a project targeting a community. Measuring community capacity dimensions is usually expensive and time consuming, requiring locally organised surveys. Therefore, we investigate a technique to estimate them by applying the Random Forests algorithm on secondary open government data. This research focuses on the prediction of measures for two dimensions: sense of community and participation. The most important variables for this prediction were determined. The variables included in the datasets used to train the predictive models complied with two criteria: nationwide availability; sufficiently fine-grained geographic breakdown, i.e. neighbourhood level. The models explained 77% of the sense of community measures and 63% of participation. Due to the low geographic detail of the outcome measures available, further research is required to apply the predictive models to a neighbourhood level. The variables that were found to be more determinant for prediction were only partially in agreement with the factors that, according to the social science literature consulted, are the most influential for sense of community and participation. This finding should be further investigated from a social science perspective, in order to be understood in depth.
Resumo:
We have designed and implemented a low-cost digital system using closed-circuit television cameras coupled to a digital acquisition system for the recording of in vivo behavioral data in rodents and for allowing observation and recording of more than 10 animals simultaneously at a reduced cost, as compared with commercially available solutions. This system has been validated using two experimental rodent models: one involving chemically induced seizures and one assessing appetite and feeding. We present observational results showing comparable or improved levels of accuracy and observer consistency between this new system and traditional methods in these experimental models, discuss advantages of the presented system over conventional analog systems and commercially available digital systems, and propose possible extensions to the system and applications to nonrodent studies.
Tropical stratospheric zonal winds in ECMWF ERA-40 reanalysis, rocketsonde data, and rawinsonde data
Resumo:
We have designed and implemented a low-cost digital system using closed-circuit television cameras coupled to a digital acquisition system for the recording of in vivo behavioral data in rodents and for allowing observation and recording of more than 10 animals simultaneously at a reduced cost, as compared with commercially available solutions. This system has been validated using two experimental rodent models: one involving chemically induced seizures and one assessing appetite and feeding. We present observational results showing comparable or improved levels of accuracy and observer consistency between this new system and traditional methods in these experimental models, discuss advantages of the presented system over conventional analog systems and commercially available digital systems, and propose possible extensions to the system and applications to non-rodent studies.
Resumo:
Purpose - The purpose of this paper is to identify the most popular techniques used to rank a web page highly in Google. Design/methodology/approach - The paper presents the results of a study into 50 highly optimized web pages that were created as part of a Search Engine Optimization competition. The study focuses on the most popular techniques that were used to rank highest in this competition, and includes an analysis on the use of PageRank, number of pages, number of in-links, domain age and the use of third party sites such as directories and social bookmarking sites. A separate study was made into 50 non-optimized web pages for comparison. Findings - The paper provides insight into the techniques that successful Search Engine Optimizers use to ensure a page ranks highly in Google. Recognizes the importance of PageRank and links as well as directories and social bookmarking sites. Research limitations/implications - Only the top 50 web sites for a specific query were analyzed. Analysing more web sites and comparing with similar studies in different competition would provide more concrete results. Practical implications - The paper offers a revealing insight into the techniques used by industry experts to rank highly in Google, and the success or other-wise of those techniques. Originality/value - This paper fulfils an identified need for web sites and e-commerce sites keen to attract a wider web audience.
Resumo:
When performing data fusion, one often measures where targets were and then wishes to deduce where targets currently are. There has been recent research on the processing of such out-of-sequence data. This research has culminated in the development of a number of algorithms for solving the associated tracking problem. This paper reviews these different approaches in a common Bayesian framework and proposes an architecture that orthogonalises the data association and out-of-sequence problems such that any combination of solutions to these two problems can be used together. The emphasis is not on advocating one approach over another on the basis of computational expense, but rather on understanding the relationships among the algorithms so that any approximations made are explicit. Results for a multi-sensor scenario involving out-of-sequence data association are used to illustrate the utility of this approach in a specific context.
Resumo:
The use of online data is becoming increasingly essential for the generation of insight in today’s research environment. This reflects the much wider range of data available online and the key role that social media now plays in interpersonal communication. However, the process of gaining permission to use social media data for research purposes creates a number of significant issues when considering compatibility with professional ethics guidelines. This paper critically explores the application of existing informed consent policies to social media research and compares with the form of consent gained by the social networks themselves, which we label ‘uninformed consent’. We argue that, as currently constructed, informed consent carries assumptions about the nature of privacy that are not consistent with the way that consumers behave in an online environment. On the other hand, uninformed consent relies on asymmetric relationships that are unlikely to succeed in an environment based on co-creation of value. The paper highlights the ethical ambiguity created by current approaches for gaining customer consent, and proposes a new conceptual framework based on participative consent that allows for greater alignment between consumer privacy and ethical concerns.
Resumo:
The increased availability of digital elevation models and satellite image data enable testing of morphometric relationships between sand dune variables (dune height, spacing and equivalent sand thickness), which were originally established using limited field survey data. These long-established geomorphological hypotheses can now be tested against very much larger samples than were possible when available data were limited to what could be collected by field surveys alone. This project uses ASTER Global Digital Elevation Model (GDEM) data to compare morphometric relationships between sand dune variables in the southwest Kalahari dunefield to those of the Namib Sand Sea, to test whether the relationships found in an active sand sea (Namib) also hold for the fixed dune system of the nearby southwest Kalahari. The data show significant morphometric differences between the simple linear dunes of the Namib sand sea and the southwest Kalahari; the latter do not show the expected positive relationship between dune height and spacing. The southwest Kalahari dunes show a similar range of dune spacings, but they are less tall, on average, than the Namib sand sea dunes. There is a clear spatial pattern to these morphometric data; the tallest and most closely spaced dunes are towards the southeast of the Kalahari dunefield; and this is where the highest values of equivalent sand thickness result. We consider the possible reasons for the observed differences and highlight the need for more studies comparing sand seas and dunefields from different environmental settings.
Resumo:
Pode-se afirmar que a evolução tecnológica (desenvolvimento de novos instrumentos de medição como, softwares, satélites e computadores, bem como, o barateamento das mídias de armazenamento) permite às Organizações produzirem e adquirirem grande quantidade de dados em curto espaço de tempo. Devido ao volume de dados, Organizações de pesquisa se tornam potencialmente vulneráveis aos impactos da explosão de informações. Uma solução adotada por algumas Organizações é a utilização de ferramentas de sistemas de informação para auxiliar na documentação, recuperação e análise dos dados. No âmbito científico, essas ferramentas são desenvolvidas para armazenar diferentes padrões de metadados (dados sobre dados). Durante o processo de desenvolvimento destas ferramentas, destaca-se a adoção de padrões como a Linguagem Unificada de Modelagem (UML, do Inglês Unified Modeling Language), cujos diagramas auxiliam na modelagem de diferentes aspectos do software. O objetivo deste estudo é apresentar uma ferramenta de sistemas de informação para auxiliar na documentação dos dados das Organizações por meio de metadados e destacar o processo de modelagem de software, por meio da UML. Será abordado o Padrão de Metadados Digitais Geoespaciais, amplamente utilizado na catalogação de dados por Organizações científicas de todo mundo, e os diagramas dinâmicos e estáticos da UML como casos de uso, sequências e classes. O desenvolvimento das ferramentas de sistemas de informação pode ser uma forma de promover a organização e a divulgação de dados científicos. No entanto, o processo de modelagem requer especial atenção para o desenvolvimento de interfaces que estimularão o uso das ferramentas de sistemas de informação.
Resumo:
Diese Dissertation stellt das neu entwickelte SystemRelAndXML vor, das für das Management und dieSpeicherung von hypertextzentrierten XML-Dokumenten und dendazugehörenden XSL-Stylesheets spezialisiert ist. DerAnwendungsbereich sind die Vorlesungsmaterialien anUniversitäten. RelAndXML speichert die XML-formatiertenÜbungsblätter in Textbausteinen und weiterenTeilen in einer speziellen Datenbank.Die Speicherung von XML-Dokumenten in Datenbanken ist seiteinigen Jahren ein wichtiges Thema der Datenbankforschung.Ansätze dafür gliedern sich in solche fürdatenzentrierte und andere für dokumentenzentrierteDokumente. Die Dissertation präsentiert einen Ansatzzur Speicherung von hypertextzentrierten XML-Dokumenten, derAspekte von datenzentrierten und dokumentenzentriertenAnsätzen kombiniert. Der Ansatz erlaubt dieWiederverwendung von Textbausteinen und speichert dieReihenfolge dort, wo sie wichtig ist. Mit RelAndXML könnennicht nur Elemente gespeichert werden, wie mit einigenanderen Ansätzen, sondern auch Attribute, Kommentareund Processing Instructions. Algorithmen für dieFragmentierung und Rekonstruktion von Dokumenten werdenbereit gestellt.RelAndXML wurde mit Java und unter Verwendung einerobjektrelationalen Datenbank implementiert. Das System hateine graphische Benutzungsoberfläche, die das Erstellenund Verändern der XML- und XSL-Dokumente, dasEinfügen von neuen oder schon gespeichertenTextbausteinen sowie das Erzeugen von HTML-Dokumenten zurVeröffentlichung ermöglicht.
Resumo:
Jahnke and Asher explore workflows and methodologies at a variety of academic data curation sites, and Keralis delves into the academic milieu of library and information schools that offer instruction in data curation. Their conclusions point to the urgent need for a reliable and increasingly sophisticated professional cohort to support data-intensive research in our colleges, universities, and research centers.