832 resultados para databases and data mining


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The emergence of powerful new technologies, the existence of large quantities of data, and increasing demands for the extraction of added value from these technologies and data have created a number of significant challenges for those charged with both corporate and information technology management. The possibilities are great, the expectations high, and the risks significant. Organisations seeking to employ cloud technologies and exploit the value of the data to which they have access, be this in the form of "Big Data" available from different external sources or data held within the organisation, in structured or unstructured formats, need to understand the risks involved in such activities. Data owners have responsibilities towards the subjects of the data and must also, frequently, demonstrate that they are in compliance with current standards, laws and regulations. This thesis sets out to explore the nature of the technologies that organisations might utilise, identify the most pertinent constraints and risks, and propose a framework for the management of data from discovery to external hosting that will allow the most significant risks to be managed through the definition, implementation, and performance of appropriate internal control activities.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

ObjectiveCandidate genes for non-alcoholic fatty liver disease (NAFLD) identified by a bioinformatics approach were examined for variant associations to quantitative traits of NAFLD-related phenotypes.Research Design and MethodsBy integrating public database text mining, trans-organism protein-protein interaction transferal, and information on liver protein expression a protein-protein interaction network was constructed and from this a smaller isolated interactome was identified. Five genes from this interactome were selected for genetic analysis. Twenty-one tag single-nucleotide polymorphisms (SNPs) which captured all common variation in these genes were genotyped in 10,196 Danes, and analyzed for association with NAFLD-related quantitative traits, type 2 diabetes (T2D), central obesity, and WHO-defined metabolic syndrome (MetS).Results273 genes were included in the protein-protein interaction analysis and EHHADH, ECHS1, HADHA, HADHB, and ACADL were selected for further examination. A total of 10 nominal statistical significant associations (P<0.05) to quantitative metabolic traits were identified. Also, the case-control study showed associations between variation in the five genes and T2D, central obesity, and MetS, respectively. Bonferroni adjustments for multiple testing negated all associations.ConclusionsUsing a bioinformatics approach we identified five candidate genes for NAFLD. However, we failed to provide evidence of associations with major effects between SNPs in these five genes and NAFLD-related quantitative traits, T2D, central obesity, and MetS.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Peptide toxins synthesized by venomous animals have been extensively studied in the last decades. To be useful to the scientific community, this knowledge has been stored, annotated and made easy to retrieve by several databases. The aim of this article is to present what type of information users can access from each database. ArachnoServer and ConoServer focus on spider toxins and cone snail toxins, respectively. UniProtKB, a generalist protein knowledgebase, has an animal toxin-dedicated annotation program that includes toxins from all venomous animals. Finally, the ATDB metadatabase compiles data and annotations from other databases and provides toxin ontology.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The paper presents the Multiple Kernel Learning (MKL) approach as a modelling and data exploratory tool and applies it to the problem of wind speed mapping. Support Vector Regression (SVR) is used to predict spatial variations of the mean wind speed from terrain features (slopes, terrain curvature, directional derivatives) generated at different spatial scales. Multiple Kernel Learning is applied to learn kernels for individual features and thematic feature subsets, both in the context of feature selection and optimal parameters determination. An empirical study on real-life data confirms the usefulness of MKL as a tool that enhances the interpretability of data-driven models.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Powell Basin is a small oceanic basin located at the NE end of the Antarctic Peninsula developed during the Early Miocene and mostly surrounded by the continental crusts of the South Orkney Microcontinent, South Scotia Ridge and Antarctic Peninsula margins. Gravity data from the SCAN 97 cruise obtained with the R/V Hespérides and data from the Global Gravity Grid and Sea Floor Topography (GGSFT) database (Sandwell and Smith, 1997) are used to determine the 3D geometry of the crustal-mantle interface (CMI) by numerical inversion methods. Water layer contribution and sedimentary effects were eliminated from the Free Air anomaly to obtain the total anomaly. Sedimentary effects were obtained from the analysis of existing and new SCAN 97 multichannel seismic profiles (MCS). The regional anomaly was obtained after spectral and filtering processes. The smooth 3D geometry of the crustal mantle interface obtained after inversion of the regional anomaly shows an increase in the thickness of the crust towards the continental margins and a NW-SE oriented axis of symmetry coinciding with the position of an older oceanic spreading axis. This interface shows a moderate uplift towards the western part and depicts two main uplifts to the northern and eastern sectors.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

With the dramatic increase in the volume of experimental results in every domain of life sciences, assembling pertinent data and combining information from different fields has become a challenge. Information is dispersed over numerous specialized databases and is presented in many different formats. Rapid access to experiment-based information about well-characterized proteins helps predict the function of uncharacterized proteins identified by large-scale sequencing. In this context, universal knowledgebases play essential roles in providing access to data from complementary types of experiments and serving as hubs with cross-references to many specialized databases. This review outlines how the value of experimental data is optimized by combining high-quality protein sequences with complementary experimental results, including information derived from protein 3D-structures, using as an example the UniProt knowledgebase (UniProtKB) and the tools and links provided on its website ( http://www.uniprot.org/ ). It also evokes precautions that are necessary for successful predictions and extrapolations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The present study focuses on single-case data analysis and specifically on two procedures for quantifying differences between baseline and treatment measurements The first technique tested is based on generalized least squares regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The comparison is carried out in the context of generated data representing a variety of patterns (i.e., independent measurements, different serial dependence underlying processes, constant or phase-specific autocorrelation and data variability, different types of trend, and slope and level change). The results suggest that the two techniques perform adequately for a wide range of conditions and researchers can use both of them with certain guarantees. The regression-based procedure offers more efficient estimates, whereas the proposed non-regression procedure is more sensitive to intervention effects. Considering current and previous findings, some tentative recommendations are offered to applied researchers in order to help choosing among the plurality of single-case data analysis techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining, in silico drug-target profiling, and analyses based on gene-disease networks and biological pathways. Application examples of these workflows carried out on selected cases of drug safety signals are discussed. The methodology and workflows presented offer a novel approach to explore the molecular mechanisms underlying adverse drug reactions

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Past and current climate change has already induced drastic biological changes. We need projections of how future climate change will further impact biological systems. Modeling is one approach to forecast future ecological impacts, but requires data for model parameterization. As collecting new data is costly, an alternative is to use the increasingly available georeferenced species occurrence and natural history databases. Here, we illustrate the use of such databases to assess climate change impacts on mountain flora. We show that these data can be used effectively to derive dynamic impact scenarios, suggesting upward migration of many species and possible extinctions when no suitable habitat is available at higher elevations. Systematically georeferencing all existing natural history collections data in mountain regions could allow a larger assessment of climate change impact on mountain ecosystems in Europe and elsewhere.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

PURPOSE: Pharmacovigilance methods have advanced greatly during the last decades, making post-market drug assessment an essential drug evaluation component. These methods mainly rely on the use of spontaneous reporting systems and health information databases to collect expertise from huge amounts of real-world reports. The EU-ADR Web Platform was built to further facilitate accessing, monitoring and exploring these data, enabling an in-depth analysis of adverse drug reactions risks.METHODS: The EU-ADR Web Platform exploits the wealth of data collected within a large-scale European initiative, the EU-ADR project. Millions of electronic health records, provided by national health agencies, are mined for specific drug events, which are correlated with literature, protein and pathway data, resulting in a rich drug-event dataset. Next, advanced distributed computing methods are tailored to coordinate the execution of data-mining and statistical analysis tasks. This permits obtaining a ranked drug-event list, removing spurious entries and highlighting relationships with high risk potential.RESULTS: The EU-ADR Web Platform is an open workspace for the integrated analysis of pharmacovigilance datasets. Using this software, researchers can access a variety of tools provided by distinct partners in a single centralized environment. Besides performing standalone drug-event assessments, they can also control the pipeline for an improved batch analysis of custom datasets. Drug-event pairs can be substantiated and statistically analysed within the platform's innovative working environment.CONCLUSIONS: A pioneering workspace that helps in explaining the biological path of adverse drug reactions was developed within the EU-ADR project consortium. This tool, targeted at the pharmacovigilance community, is available online at https://bioinformatics.ua.pt/euadr/. Copyright © 2012 John Wiley & Sons, Ltd.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The main objective of this Master Thesis is to discover more about Girona’s image as a tourism destination from different agents’ perspective and to study its differences on promotion or opinions. In order to meet this objective, three components of Girona’s destination image will be studied: attribute-based component, the holistic component, and the affective component. It is true that a lot of research has been done about tourism destination image, but it is less when we are talking about the destination of Girona. Some studies have already focused on Girona as a tourist destination, but they used a different type of sample and different methodological steps. This study is new among destination studies in the sense that it is based only on textual online data and it follows a methodology based on text-miming. Text-mining is a kind of methodology that allows people extract relevant information from texts. Also, after this information is extracted by this methodology, some statistical multivariate analyses are done with the aim of discovering more about Girona’s tourism image

Relevância:

100.00% 100.00%

Publicador:

Resumo:

We carried out a systematic review of HPV vaccine pre- and post-licensure trials to assess the evidence of their effectiveness and safety. We find that HPV vaccine clinical trials design, and data interpretation of both efficacy and safety outcomes, were largely inadequate. Additionally, we note evidence of selective reporting of results from clinical trials (i.e., exclusion of vaccine efficacy figures related to study subgroups in which efficacy might be lower or even negative from peer-reviewed publications). Given this, the widespread optimism regarding HPV vaccines long-term benefits appears to rest on a number of unproven assumptions (or such which are at odd with factual evidence) and significant misinterpretation of available data. For example, the claim that HPV vaccination will result in approximately 70% reduction of cervical cancers is made despite the fact that the clinical trials data have not demonstrated to date that the vaccines have actually prevented a single case of cervical cancer (let alone cervical cancer death), nor that the current overly optimistic surrogate marker-based extrapolations are justified. Likewise, the notion that HPV vaccines have an impressive safety profile is only supported by highly flawed design of safety trials and is contrary to accumulating evidence from vaccine safety surveillance databases and case reports which continue to link HPV vaccination to serious adverse outcomes (including death and permanent disabilities). We thus conclude that further reduction of cervical cancers might be best achieved by optimizing cervical screening (which carries no such risks) and targeting other factors of the disease rather than by the reliance on vaccines with questionable efficacy and safety profiles.