957 resultados para Data-representation
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
A faithful depiction of the tropical atmosphere requires three-dimensional sets of observations. Despite the increasing amount of observations presently available, these will hardly ever encompass the entire atmosphere and, in addition, observations have errors. Additional (background) information will always be required to complete the picture. Valuable added information comes from the physical laws governing the flow, usually mediated via a numerical weather prediction (NWP) model. These models are, however, never going to be error-free, why a reliable estimate of their errors poses a real challenge since the whole truth will never be within our grasp. The present thesis addresses the question of improving the analysis procedures for NWP in the tropics. Improvements are sought by addressing the following issues: - the efficiency of the internal model adjustment, - the potential of the reliable background-error information, as compared to observations, - the impact of a new, space-borne line-of-sight wind measurements, and - the usefulness of multivariate relationships for data assimilation in the tropics. Most NWP assimilation schemes are effectively univariate near the equator. In this thesis, a multivariate formulation of the variational data assimilation in the tropics has been developed. The proposed background-error model supports the mass-wind coupling based on convectively-coupled equatorial waves. The resulting assimilation model produces balanced analysis increments and hereby increases the efficiency of all types of observations. Idealized adjustment and multivariate analysis experiments highlight the importance of direct wind measurements in the tropics. In particular, the presented results confirm the superiority of wind observations compared to mass data, in spite of the exact multivariate relationships available from the background information. The internal model adjustment is also more efficient for wind observations than for mass data. In accordance with these findings, new satellite wind observations are expected to contribute towards the improvement of NWP and climate modeling in the tropics. Although incomplete, the new wind-field information has the potential to reduce uncertainties in the tropical dynamical fields, if used together with the existing satellite mass-field measurements. The results obtained by applying the new background-error representation to the tropical short-range forecast errors of a state-of-art NWP model suggest that achieving useful tropical multivariate relationships may be feasible within an operational NWP environment.
Resumo:
Degree in Marine Sciences. Faculty of Marine Sciences, University of Las Palmas de Gran Canaria. Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas
Resumo:
[EN]This paper presents the experimental measurements of isobaric vapor−liquid equilibria (iso-p VLE) and excess volumes (vE) at several temperatures in the interval (288.15 to 328.15) K for six binary systems composed of two alkyl (methyl, ethyl) propanoates and three odd carbon alkanes (C5 to C9). The mixing processes were expansive, vE > 0, with (δvE/δT)p > 0, and endothermic. The installation used to measure the iso-p VLE was improved by controlling three of the variables involved in the experimentation with a PC.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
We present a non linear technique to invert strong motion records with the aim of obtaining the final slip and rupture velocity distributions on the fault plane. In this thesis, the ground motion simulation is obtained evaluating the representation integral in the frequency. The Green’s tractions are computed using the discrete wave-number integration technique that provides the full wave-field in a 1D layered propagation medium. The representation integral is computed through a finite elements technique, based on a Delaunay’s triangulation on the fault plane. The rupture velocity is defined on a coarser regular grid and rupture times are computed by integration of the eikonal equation. For the inversion, the slip distribution is parameterized by 2D overlapping Gaussian functions, which can easily relate the spectrum of the possible solutions with the minimum resolvable wavelength, related to source-station distribution and data processing. The inverse problem is solved by a two-step procedure aimed at separating the computation of the rupture velocity from the evaluation of the slip distribution, the latter being a linear problem, when the rupture velocity is fixed. The non-linear step is solved by optimization of an L2 misfit function between synthetic and real seismograms, and solution is searched by the use of the Neighbourhood Algorithm. The conjugate gradient method is used to solve the linear step instead. The developed methodology has been applied to the M7.2, Iwate Nairiku Miyagi, Japan, earthquake. The estimated magnitude seismic moment is 2.6326 dyne∙cm that corresponds to a moment magnitude MW 6.9 while the mean the rupture velocity is 2.0 km/s. A large slip patch extends from the hypocenter to the southern shallow part of the fault plane. A second relatively large slip patch is found in the northern shallow part. Finally, we gave a quantitative estimation of errors associates with the parameters.
Resumo:
The aim of this thesis is to apply multilevel regression model in context of household surveys. Hierarchical structure in this type of data is characterized by many small groups. In last years comparative and multilevel analysis in the field of perceived health have grown in size. The purpose of this thesis is to develop a multilevel analysis with three level of hierarchy for Physical Component Summary outcome to: evaluate magnitude of within and between variance at each level (individual, household and municipality); explore which covariates affect on perceived physical health at each level; compare model-based and design-based approach in order to establish informativeness of sampling design; estimate a quantile regression for hierarchical data. The target population are the Italian residents aged 18 years and older. Our study shows a high degree of homogeneity within level 1 units belonging from the same group, with an intraclass correlation of 27% in a level-2 null model. Almost all variance is explained by level 1 covariates. In fact, in our model the explanatory variables having more impact on the outcome are disability, unable to work, age and chronic diseases (18 pathologies). An additional analysis are performed by using novel procedure of analysis :"Linear Quantile Mixed Model", named "Multilevel Linear Quantile Regression", estimate. This give us the possibility to describe more generally the conditional distribution of the response through the estimation of its quantiles, while accounting for the dependence among the observations. This has represented a great advantage of our models with respect to classic multilevel regression. The median regression with random effects reveals to be more efficient than the mean regression in representation of the outcome central tendency. A more detailed analysis of the conditional distribution of the response on other quantiles highlighted a differential effect of some covariate along the distribution.
Resumo:
Assessment of the integrity of structural components is of great importance for aerospace systems, land and marine transportation, civil infrastructures and other biological and mechanical applications. Guided waves (GWs) based inspections are an attractive mean for structural health monitoring. In this thesis, the study and development of techniques for GW ultrasound signal analysis and compression in the context of non-destructive testing of structures will be presented. In guided wave inspections, it is necessary to address the problem of the dispersion compensation. A signal processing approach based on frequency warping was adopted. Such operator maps the frequencies axis through a function derived by the group velocity of the test material and it is used to remove the dependence on the travelled distance from the acquired signals. Such processing strategy was fruitfully applied for impact location and damage localization tasks in composite and aluminum panels. It has been shown that, basing on this processing tool, low power embedded system for GW structural monitoring can be implemented. Finally, a new procedure based on Compressive Sensing has been developed and applied for data reduction. Such procedure has also a beneficial effect in enhancing the accuracy of structural defects localization. This algorithm uses the convolutive model of the propagation of ultrasonic guided waves which takes advantage of a sparse signal representation in the warped frequency domain. The recovery from the compressed samples is based on an alternating minimization procedure which achieves both an accurate reconstruction of the ultrasonic signal and a precise estimation of waves time of flight. Such information is used to feed hyperbolic or elliptic localization procedures, for accurate impact or damage localization.
Resumo:
In this study the population structure and connectivity of the Mediterranean and Atlantic Raja clavata (L., 1758) were investigated by analyzing the genetic variation of six population samples (N = 144) at seven nuclear microsatellite loci. The genetic dataset was generated by selecting population samples available in the tissue databases of the GenoDREAM laboratory (University of Bologna) and of the Department of Life Sciences and Environment (University of Cagliari), all collected during past scientific surveys (MEDITS, GRUND) from different geographical locations in the Mediterranean basin and North-east Atlantic sea, as North Sea, Sardinian coasts, Tuscany coasts and Cyprus Island. This thesis deals with to estimate the genetic diversity and differentiation among 6 geographical samples, in particular, to assess the presence of any barrier (geographic, hydrogeological or biological) to gene flow evaluating both the genetic diversity (nucleotide diversity, observed and expected heterozygosity, Hardy- Weinberg equilibrium analysis) and population differentiation (Fst estimates, population structure analysis). In addition to molecular analysis, quantitative representation and statistical analysis of morphological individuals shape are performed using geometric morphometrics methods and statistical tests. Geometric coordinates call landmarks are fixed in 158 individuals belonging to two population samples of Raja clavata and in population samples of closely related species, Raja straeleni (cryptic sibling) and Raja asterias, to assess significant morphological differences at multiple taxonomic levels. The results obtained from the analysis of the microsatellite dataset suggested a geographic and genetic separation between populations from Central-Western and Eastern Mediterranean basins. Furthermore, the analysis also showed that there was no separation between geographic samples from North Atlantic Ocean and central-Western Mediterranean, grouping them to a panmictic population. The Landmark-based geometric morphometry method results showed significant differences of body shape able to discriminate taxa at tested levels (from species to populations).
Resumo:
Local to regional climate anomalies are to a large extent determined by the state of the atmospheric circulation. The knowledge of large-scale sea level pressure (SLP) variations in former times is therefore crucial when addressing past climate changes across Europe and the Mediterranean. However, currently available SLP reconstructions lack data from the ocean, particularly in the pre-1850 period. Here we present a new statistically-derived 5° × 5° resolved gridded seasonal SLP dataset covering the eastern North Atlantic, Europe and the Mediterranean area (40°W–50°E; 20°N–70°N) back to 1750 using terrestrial instrumental pressure series and marine wind information from ship logbooks. For the period 1750–1850, the new SLP reconstruction provides a more accurate representation of the strength of the winter westerlies as well as the location and variability of the Azores High than currently available multiproxy pressure field reconstructions. These findings strongly support the potential of ship logbooks as an important source to determine past circulation variations especially for the pre-1850 period. This new dataset can be further used for dynamical studies relating large-scale atmospheric circulation to temperature and precipitation variability over the Mediterranean and Eurasia, for the comparison with outputs from GCMs as well as for detection and attribution studies.
Resumo:
High density spatial and temporal sampling of EEG data enhances the quality of results of electrophysiological experiments. Because EEG sources typically produce widespread electric fields (see Chapter 3) and operate at frequencies well below the sampling rate, increasing the number of electrodes and time samples will not necessarily increase the number of observed processes, but mainly increase the accuracy of the representation of these processes. This is namely the case when inverse solutions are computed. As a consequence, increasing the sampling in space and time increases the redundancy of the data (in space, because electrodes are correlated due to volume conduction, and time, because neighboring time points are correlated), while the degrees of freedom of the data change only little. This has to be taken into account when statistical inferences are to be made from the data. However, in many ERP studies, the intrinsic correlation structure of the data has been disregarded. Often, some electrodes or groups of electrodes are a priori selected as the analysis entity and considered as repeated (within subject) measures that are analyzed using standard univariate statistics. The increased spatial resolution obtained with more electrodes is thus poorly represented by the resulting statistics. In addition, the assumptions made (e.g. in terms of what constitutes a repeated measure) are not supported by what we know about the properties of EEG data. From the point of view of physics (see Chapter 3), the natural “atomic” analysis entity of EEG and ERP data is the scalp electric field
Resumo:
Identifying and comparing different steady states is an important task for clinical decision making. Data from unequal sources, comprising diverse patient status information, have to be interpreted. In order to compare results an expressive representation is the key. In this contribution we suggest a criterion to calculate a context-sensitive value based on variance analysis and discuss its advantages and limitations referring to a clinical data example obtained during anesthesia. Different drug plasma target levels of the anesthetic propofol were preset to reach and maintain clinically desirable steady state conditions with target controlled infusion (TCI). At the same time systolic blood pressure was monitored, depth of anesthesia was recorded using the bispectral index (BIS) and propofol plasma concentrations were determined in venous blood samples. The presented analysis of variance (ANOVA) is used to quantify how accurately steady states can be monitored and compared using the three methods of measurement.
Resumo:
This paper presents problems arising from the lack of standardized methods for recording skeletal remains. Using practical examples it is shown how preservation and representation of bones can distort observations and how this can be reduced by systematic data acquisition.
Resumo:
Traditionally, ontologies describe knowledge representation in a denotational, formalized, and deductive way. In addition, in this paper, we propose a semiotic, inductive, and approximate approach to ontology creation. We define a conceptual framework, a semantics extraction algorithm, and a first proof of concept applying the algorithm to a small set of Wikipedia documents. Intended as an extension to the prevailing top-down ontologies, we introduce an inductive fuzzy grassroots ontology, which organizes itself organically from existing natural language Web content. Using inductive and approximate reasoning to reflect the natural way in which knowledge is processed, the ontology’s bottom-up build process creates emergent semantics learned from the Web. By this means, the ontology acts as a hub for computing with words described in natural language. For Web users, the structural semantics are visualized as inductive fuzzy cognitive maps, allowing an initial form of intelligence amplification. Eventually, we present an implementation of our inductive fuzzy grassroots ontology Thus,this paper contributes an algorithm for the extraction of fuzzy grassroots ontologies from Web data by inductive fuzzy classification.