Biblioteca Digital

861 resultados para multiple data sources

STORM - a novel information fusion and cluster interpretation technique

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Analysis of data without labels is commonly subject to scrutiny by unsupervised machine learning techniques. Such techniques provide more meaningful representations, useful for better understanding of a problem at hand, than by looking only at the data itself. Although abundant expert knowledge exists in many areas where unlabelled data is examined, such knowledge is rarely incorporated into automatic analysis. Incorporation of expert knowledge is frequently a matter of combining multiple data sources from disparate hypothetical spaces. In cases where such spaces belong to different data types, this task becomes even more challenging. In this paper we present a novel immune-inspired method that enables the fusion of such disparate types of data for a specific set of problems. We show that our method provides a better visual understanding of one hypothetical space with the help of data from another hypothetical space. We believe that our model has implications for the field of exploratory data analysis and knowledge discovery.

A graph theoretic approach to testing associations between disparate sources of functional genomic data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology. We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.

Data queries over heterogeneous sources

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Dissertação para obtenção do Grau de Mestre em Engenharia Informática

Predicting cancer involvement of genes from heterogeneous data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Systematic approaches for identifying proteins involved in different types of cancer are needed. Experimental techniques such as microarrays are being used to characterize cancer, but validating their results can be a laborious task. Computational approaches are used to prioritize between genes putatively involved in cancer, usually based on further analyzing experimental data. Results: We implemented a systematic method using the PIANA software that predicts cancer involvement of genes by integrating heterogeneous datasets. Specifically, we produced lists of genes likely to be involved in cancer by relying on: (i) protein-protein interactions; (ii) differential expression data; and (iii) structural and functional properties of cancer genes. The integrative approach that combines multiple sources of data obtained positive predictive values ranging from 23% (on a list of 811 genes) to 73% (on a list of 22 genes), outperforming the use of any of the data sources alone. We analyze a list of 20 cancer gene predictions, finding that most of them have been recently linked to cancer in literature. Conclusion: Our approach to identifying and prioritizing candidate cancer genes can be used to produce lists of genes likely to be involved in cancer. Our results suggest that differential expression studies yielding high numbers of candidate cancer genes can be filtered using protein interaction networks.

Tracking Invasion Histories in the Sea: Facing Complex Scenarios Using Multilocus Data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In recent years, new analytical tools have allowed researchers to extract historical information contained in molecular data, which has fundamentally transformed our understanding of processes ruling biological invasions. However, the use of these new analytical tools has been largely restricted to studies of terrestrial organisms despite the growing recognition that the sea contains ecosystems that are amongst the most heavily affected by biological invasions, and that marine invasion histories are often remarkably complex. Here, we studied the routes of invasion and colonisation histories of an invasive marine invertebrate Microcosmus squamiger (Ascidiacea) using microsatellite loci, mitochondrial DNA sequence data and 11 worldwide populations. Discriminant analysis of principal components, clustering methods and approximate Bayesian computation (ABC) methods showed that the most likely source of the introduced populations was a single admixture event that involved populations from two genetically differentiated ancestral regions - the western and eastern coasts of Australia. The ABC analyses revealed that colonisation of the introduced range of M. squamiger consisted of a series of non-independent introductions along the coastlines of Africa, North America and Europe. Furthermore, we inferred that the sequence of colonisation across continents was in line with historical taxonomic records - first the Mediterranean Sea and South Africa from an unsampled ancestral population, followed by sequential introductions in California and, more recently, the NE Atlantic Ocean. We revealed the most likely invasion history for world populations of M. squamiger, which is broadly characterized by the presence of multiple ancestral sources and non-independent introductions within the introduced range. The results presented here illustrate the complexity of marine invasion routes and identify a cause-effect relationship between human-mediated transport and the success of widespread marine non-indigenous species, which benefit from stepping-stone invasions and admixture processes involving different sources for the spread and expansion of their range.

Modelling ecological values in heterogeneous and dynamic landscapes with geospatial data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Our surrounding landscape is in a constantly dynamic state, but recently the rate of changes and their effects on the environment have considerably increased. In terms of the impact on nature, this development has not been entirely positive, but has rather caused a decline in valuable species, habitats, and general biodiversity. Regardless of recognizing the problem and its high importance, plans and actions of how to stop the detrimental development are largely lacking. This partly originates from a lack of genuine will, but is also due to difficulties in detecting many valuable landscape components and their consequent neglect. To support knowledge extraction, various digital environmental data sources may be of substantial help, but only if all the relevant background factors are known and the data is processed in a suitable way. This dissertation concentrates on detecting ecologically valuable landscape components by using geospatial data sources, and applies this knowledge to support spatial planning and management activities. In other words, the focus is on observing regionally valuable species, habitats, and biotopes with GIS and remote sensing data, using suitable methods for their analysis. Primary emphasis is given to the hemiboreal vegetation zone and the drastic decline in its semi-natural grasslands, which were created by a long trajectory of traditional grazing and management activities. However, the applied perspective is largely methodological, and allows for the application of the obtained results in various contexts. Models based on statistical dependencies and correlations of multiple variables, which are able to extract desired properties from a large mass of initial data, are emphasized in the dissertation. In addition, the papers included combine several data sets from different sources and dates together, with the aim of detecting a wider range of environmental characteristics, as well as pointing out their temporal dynamics. The results of the dissertation emphasise the multidimensionality and dynamics of landscapes, which need to be understood in order to be able to recognise their ecologically valuable components. This not only requires knowledge about the emergence of these components and an understanding of the used data, but also the need to focus the observations on minute details that are able to indicate the existence of fragmented and partly overlapping landscape targets. In addition, this pinpoints the fact that most of the existing classifications are too generalised as such to provide all the required details, but they can be utilized at various steps along a longer processing chain. The dissertation also emphases the importance of landscape history as an important factor, which both creates and preserves ecological values, and which sets an essential standpoint for understanding the present landscape characteristics. The obtained results are significant both in terms of preserving semi-natural grasslands, as well as general methodological development, giving support to science-based framework in order to evaluate ecological values and guide spatial planning.

Exploring teachers' use of multiple intelligences with elementary-aged exceptional students

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose ofthis qualitative study was to explore teachers' reflections on Multiple Intelligences theory and the processes they engage in when using the theory with elementary-aged exceptional students. FOllr public school teachers took part in the study. An introductory observation visit, semistructured in-depth interviews, field notes, and teachers' own written reflections served as data sources. Content-analysis was applied to review the data for thenles related to the research topic. The findings indicated several benefits of using Multiple Intelligences. This tlleory appeared to affect teachers' views of exceptionalleamers, directing the teachers' fOClIS to the students' potentials. It also seemed to have value for assisting teachers in planning an inclusive approacll, enhancing exceptional students' self-esteem, developing nletacognition, and prolTIoting cognitive engagement. Finally, the findings suggest that Multiple Intelligences has inlplications for teachers' professional development to reach a more diverse range of students.

Multiplicity of data in trial reports and the reliability of meta-analyses: empirical study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objectives To examine the extent of multiplicity of data in trial reports and to assess the impact of multiplicity on meta-analysis results. Design Empirical study on a cohort of Cochrane systematic reviews. Data sources All Cochrane systematic reviews published from issue 3 in 2006 to issue 2 in 2007 that presented a result as a standardised mean difference (SMD). We retrieved trial reports contributing to the first SMD result in each review, and downloaded review protocols. We used these SMDs to identify a specific outcome for each meta-analysis from its protocol. Review methods Reviews were eligible if SMD results were based on two to ten randomised trials and if protocols described the outcome. We excluded reviews if they only presented results of subgroup analyses. Based on review protocols and index outcomes, two observers independently extracted the data necessary to calculate SMDs from the original trial reports for any intervention group, time point, or outcome measure compatible with the protocol. From the extracted data, we used Monte Carlo simulations to calculate all possible SMDs for every meta-analysis. Results We identified 19 eligible meta-analyses (including 83 trials). Published review protocols often lacked information about which data to choose. Twenty-four (29%) trials reported data for multiple intervention groups, 30 (36%) reported data for multiple time points, and 29 (35%) reported the index outcome measured on multiple scales. In 18 meta-analyses, we found multiplicity of data in at least one trial report; the median difference between the smallest and largest SMD results within a meta-analysis was 0.40 standard deviation units (range 0.04 to 0.91). Conclusions Multiplicity of data can affect the findings of systematic reviews and meta-analyses. To reduce the risk of bias, reviews and meta-analyses should comply with prespecified protocols that clearly identify time points, intervention groups, and scales of interest.

Accuracy of magnetic resonance imaging for the diagnosis of multiple sclerosis: systematic review

Relevância:

100.00% 100.00%

Publicador:

Resumo:

OBJECTIVE: To determine the accuracy of magnetic resonance imaging criteria for the early diagnosis of multiple sclerosis in patients with suspected disease. DESIGN: Systematic review. DATA SOURCES: 12 electronic databases, citation searches, and reference lists of included studies. Review methods Studies on accuracy of diagnosis that compared magnetic resonance imaging, or diagnostic criteria incorporating such imaging, to a reference standard for the diagnosis of multiple sclerosis. RESULTS: 29 studies (18 cohort studies, 11 other designs) were included. On average, studies of other designs (mainly diagnostic case-control studies) produced higher estimated diagnostic odds ratios than did cohort studies. Among 15 studies of higher methodological quality (cohort design, clinical follow-up as reference standard), those with longer follow-up produced higher estimates of specificity and lower estimates of sensitivity. Only two such studies followed patients for more than 10 years. Even in the presence of many lesions (> 10 or > 8), magnetic resonance imaging could not accurately rule multiple sclerosis in (likelihood ratio of a positive test result 3.0 and 2.0, respectively). Similarly, the absence of lesions was of limited utility in ruling out a diagnosis of multiple sclerosis (likelihood ratio of a negative test result 0.1 and 0.5). CONCLUSIONS: Many evaluations of the accuracy of magnetic resonance imaging for the early detection of multiple sclerosis have produced inflated estimates of test performance owing to methodological weaknesses. Use of magnetic resonance imaging to confirm multiple sclerosis on the basis of a single attack of neurological dysfunction may lead to over-diagnosis and over-treatment.

Data normalization and integration in robotic systems using web services technologies

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The robotics is one of the most active areas. We also need to join a large number of disciplines to create robots. With these premises, one problem is the management of information from multiple heterogeneous sources. Each component, hardware or software, produces data with different nature: temporal frequencies, processing needs, size, type, etc. Nowadays, technologies and software engineering paradigms such as service-oriented architectures are applied to solve this problem in other areas. This paper proposes the use of these technologies to implement a robotic control system based on services. This type of system will allow integration and collaborative work of different elements that make up a robotic system.

Approaches and Strategy for Cancer Research and Surveillance Data: Integration, Information Pipeline, Data Models, and Informatics Opportunities

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Thesis (Ph.D.)--University of Washington, 2016-04

Digital communication networks incorporating mobile data terminals

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of digital communication systems is increasing very rapidly. This is due to lower system implementation cost compared to analogue transmission and at the same time, the ease with which several types of data sources (data, digitised speech and video, etc.) can be mixed. The emergence of packet broadcast techniques as an efficient type of multiplexing, especially with the use of contention random multiple access protocols, has led to a wide-spread application of these distributed access protocols in local area networks (LANs) and a further extension of them to radio and mobile radio communication applications. In this research, a proposal for a modified version of the distributed access contention protocol which uses the packet broadcast switching technique has been achieved. The carrier sense multiple access with collision avoidance (CSMA/CA) is found to be the most appropriate protocol which has the ability to satisfy equally the operational requirements for local area networks as well as for radio and mobile radio applications. The suggested version of the protocol is designed in a way in which all desirable features of its precedents is maintained. However, all the shortcomings are eliminated and additional features have been added to strengthen its ability to work with radio and mobile radio channels. Operational performance evaluation of the protocol has been carried out for the two types of non-persistent and slotted non-persistent, through mathematical and simulation modelling of the protocol. The results obtained from the two modelling procedures validate the accuracy of both methods, which compares favourably with its precedent protocol CSMA/CD (with collision detection). A further extension of the protocol operation has been suggested to operate with multichannel systems. Two multichannel systems based on the CSMA/CA protocol for medium access are therefore proposed. These are; the dynamic multichannel system, which is based on two types of channel selection, the random choice (RC) and the idle choice (IC), and the sequential multichannel system. The latter has been proposed in order to supress the effect of the hidden terminal, which always represents a major problem with the usage of the contention random multiple access protocols with radio and mobile radio channels. Verification of their operation performance evaluation has been carried out using mathematical modelling for the dynamic system. However, simulation modelling has been chosen for the sequential system. Both systems are found to improve system operation and fault tolerance when compared to single channel operation.

The integration of multi-source data to improve the classification of remotely sensed images

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The number of remote sensing platforms and sensors rises almost every year, yet much work on the interpretation of land cover is still carried out using either single images or images from the same source taken at different dates. Two questions could be asked of this proliferation of images: can the information contained in different scenes be used to improve the classification accuracy and, what is the best way to combine the different imagery? Two of these multiple image sources are MODIS on the Terra platform and ETM+ on board Landsat7, which are suitably complementary. Daily MODIS images with 36 spectral bands in 250-1000 m spatial resolution and seven spectral bands of ETM+ with 30m and 16 days spatial and temporal resolution respectively are available. In the UK, cloud cover may mean that only a few ETM+ scenes may be available for any particular year and these may not be at the time of year of most interest. The MODIS data may provide information on land cover over the growing season, such as harvest dates, that is not present in the ETM+ data. Therefore, the primary objective of this work is to develop a methodology for the integration of medium spatial resolution Landsat ETM+ image, with multi-temporal, multi-spectral, low-resolution MODIS \Terra images, with the aim of improving the classification of agricultural land. Additionally other data may also be incorporated such as field boundaries from existing maps. When classifying agricultural land cover of the type seen in the UK, where crops are largely sown in homogenous fields with clear and often mapped boundaries, the classification is greatly improved using the mapped polygons and utilising the classification of the polygon as a whole as an apriori probability in classifying each individual pixel using a Bayesian approach. When dealing with multiple images from different platforms and dates it is highly unlikely that the pixels will be exactly co-registered and these pixels will contain a mixture of different real world land covers. Similarly the different atmospheric conditions prevailing during the different days will mean that the same emission from the ground will give rise to different sensor reception. Therefore, a method is presented with a model of the instantaneous field of view and atmospheric effects to enable different remote sensed data sources to be integrated.

The use of Meteosat satellite data for spatial rainfall estimations and hydrological simulations

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Satellite information, in combination with conventional point source measurements, can be a valuable source of information. This thesis is devoted to the spatial estimation of areal rainfall over a region using both the measurements from a dense and sparse network of rain-gauges and images from the meteorological satellites. A primary concern is to study the effects of such satellite assisted rainfall estimates on the performance of rainfall-runoff models. Low-cost image processing systems and peripherals are used to process and manipulate the data. Both secondary as well as primary satellite images were used for analysis. The secondary data was obtained from the in-house satellite receiver and the primary data was obtained from an outside source. Ground truth data was obtained from the local Water Authority. A number of algorithms are presented that combine the satellite and conventional data sources to produce areal rainfall estimates and the results are compared with some of the more traditional methodologies. The results indicate that the satellite cloud information is valuable in the assessment of the spatial distribution of areal rainfall, for both half-hourly as well as daily estimates of rainfall. It is also demonstrated how the performance of the simple multiple regression rainfall-runoff model is improved when satellite cloud information is used as a separate input in addition to rainfall estimates from conventional means. The use of low-cost equipment, from image processing systems to satellite imagery, makes it possible for developing countries to introduce such systems in areas where the benefits are greatest.

Scaling up question-answering to linked data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linked Data semantic sources, in particular DBpedia, can be used to answer many user queries. PowerAqua is an open multi-ontology Question Answering (QA) system for the Semantic Web (SW). However, the emergence of Linked Data, characterized by its openness, heterogeneity and scale, introduces a new dimension to the Semantic Web scenario, in which exploiting the relevant information to extract answers for Natural Language (NL) user queries is a major challenge. In this paper we discuss the issues and lessons learned from our experience of integrating PowerAqua as a front-end for DBpedia and a subset of Linked Data sources. As such, we go one step beyond the state of the art on end-users interfaces for Linked Data by introducing mapping and fusion techniques needed to translate a user query by means of multiple sources. Our first informal experiments probe whether, in fact, it is feasible to obtain answers to user queries by composing information across semantic sources and Linked Data, even in its current form, where the strength of Linked Data is more a by-product of its size than its quality. We believe our experiences can be extrapolated to a variety of end-user applications that wish to scale, open up, exploit and re-use what possibly is the greatest wealth of data about everything in the history of Artificial Intelligence. © 2010 Springer-Verlag.

«
1
2
3
4
5
6
7
8
...
57
58
»