920 resultados para Data-representation
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
The design of a network is a solution to several engineering and science problems. Several network design problems are known to be NP-hard, and population-based metaheuristics like evolutionary algorithms (EAs) have been largely investigated for such problems. Such optimization methods simultaneously generate a large number of potential solutions to investigate the search space in breadth and, consequently, to avoid local optima. Obtaining a potential solution usually involves the construction and maintenance of several spanning trees, or more generally, spanning forests. To efficiently explore the search space, special data structures have been developed to provide operations that manipulate a set of spanning trees (population). For a tree with n nodes, the most efficient data structures available in the literature require time O(n) to generate a new spanning tree that modifies an existing one and to store the new solution. We propose a new data structure, called node-depth-degree representation (NDDR), and we demonstrate that using this encoding, generating a new spanning forest requires average time O(root n). Experiments with an EA based on NDDR applied to large-scale instances of the degree-constrained minimum spanning tree problem have shown that the implementation adds small constants and lower order terms to the theoretical bound.
Resumo:
Background: In the analysis of effects by cell treatment such as drug dosing, identifying changes on gene network structures between normal and treated cells is a key task. A possible way for identifying the changes is to compare structures of networks estimated from data on normal and treated cells separately. However, this approach usually fails to estimate accurate gene networks due to the limited length of time series data and measurement noise. Thus, approaches that identify changes on regulations by using time series data on both conditions in an efficient manner are demanded. Methods: We propose a new statistical approach that is based on the state space representation of the vector autoregressive model and estimates gene networks on two different conditions in order to identify changes on regulations between the conditions. In the mathematical model of our approach, hidden binary variables are newly introduced to indicate the presence of regulations on each condition. The use of the hidden binary variables enables an efficient data usage; data on both conditions are used for commonly existing regulations, while for condition specific regulations corresponding data are only applied. Also, the similarity of networks on two conditions is automatically considered from the design of the potential function for the hidden binary variables. For the estimation of the hidden binary variables, we derive a new variational annealing method that searches the configuration of the binary variables maximizing the marginal likelihood. Results: For the performance evaluation, we use time series data from two topologically similar synthetic networks, and confirm that our proposed approach estimates commonly existing regulations as well as changes on regulations with higher coverage and precision than other existing approaches in almost all the experimental settings. For a real data application, our proposed approach is applied to time series data from normal Human lung cells and Human lung cells treated by stimulating EGF-receptors and dosing an anticancer drug termed Gefitinib. In the treated lung cells, a cancer cell condition is simulated by the stimulation of EGF-receptors, but the effect would be counteracted due to the selective inhibition of EGF-receptors by Gefitinib. However, gene expression profiles are actually different between the conditions, and the genes related to the identified changes are considered as possible off-targets of Gefitinib. Conclusions: From the synthetically generated time series data, our proposed approach can identify changes on regulations more accurately than existing methods. By applying the proposed approach to the time series data on normal and treated Human lung cells, candidates of off-target genes of Gefitinib are found. According to the published clinical information, one of the genes can be related to a factor of interstitial pneumonia, which is known as a side effect of Gefitinib.
Resumo:
Objective: To identify and compare perceptions of pain and how it is faced between men and women with central post-stroke pain. Methods: The participants were 25 men and 25 women of minimum age 30 years-old and minimum schooling level of four years, presenting central post-stroke pain for at least three months. The instruments used were: Mini-Mental State Examination; structured interview for the Brief Psychiatric Scale; Survey of Sociodemographic and Clinical Data; Visual Analogue Scale (VAS); Ways of Coping with Problems Scale (WCPS) in Scale; Revised Illness Perception Questionnaire (IPQ-R); and Beck Depression Inventory (BD). Results: A significantly greater number of women used the coping strategy "Turn to spiritual and religious activities" in WCPS. They associated their emotional state with the cause of pain in IPQ-R. "Distraction of attention" was the strategy most used by the subjects. Conclusion: Women used spiritual and religious activities more as a coping strategy and perceived their emotional state as the cause of pain.
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
A faithful depiction of the tropical atmosphere requires three-dimensional sets of observations. Despite the increasing amount of observations presently available, these will hardly ever encompass the entire atmosphere and, in addition, observations have errors. Additional (background) information will always be required to complete the picture. Valuable added information comes from the physical laws governing the flow, usually mediated via a numerical weather prediction (NWP) model. These models are, however, never going to be error-free, why a reliable estimate of their errors poses a real challenge since the whole truth will never be within our grasp. The present thesis addresses the question of improving the analysis procedures for NWP in the tropics. Improvements are sought by addressing the following issues: - the efficiency of the internal model adjustment, - the potential of the reliable background-error information, as compared to observations, - the impact of a new, space-borne line-of-sight wind measurements, and - the usefulness of multivariate relationships for data assimilation in the tropics. Most NWP assimilation schemes are effectively univariate near the equator. In this thesis, a multivariate formulation of the variational data assimilation in the tropics has been developed. The proposed background-error model supports the mass-wind coupling based on convectively-coupled equatorial waves. The resulting assimilation model produces balanced analysis increments and hereby increases the efficiency of all types of observations. Idealized adjustment and multivariate analysis experiments highlight the importance of direct wind measurements in the tropics. In particular, the presented results confirm the superiority of wind observations compared to mass data, in spite of the exact multivariate relationships available from the background information. The internal model adjustment is also more efficient for wind observations than for mass data. In accordance with these findings, new satellite wind observations are expected to contribute towards the improvement of NWP and climate modeling in the tropics. Although incomplete, the new wind-field information has the potential to reduce uncertainties in the tropical dynamical fields, if used together with the existing satellite mass-field measurements. The results obtained by applying the new background-error representation to the tropical short-range forecast errors of a state-of-art NWP model suggest that achieving useful tropical multivariate relationships may be feasible within an operational NWP environment.
Resumo:
Degree in Marine Sciences. Faculty of Marine Sciences, University of Las Palmas de Gran Canaria. Institut de Ciències del Mar, Consejo Superior de Investigaciones Científicas
Resumo:
[EN]This paper presents the experimental measurements of isobaric vapor−liquid equilibria (iso-p VLE) and excess volumes (vE) at several temperatures in the interval (288.15 to 328.15) K for six binary systems composed of two alkyl (methyl, ethyl) propanoates and three odd carbon alkanes (C5 to C9). The mixing processes were expansive, vE > 0, with (δvE/δT)p > 0, and endothermic. The installation used to measure the iso-p VLE was improved by controlling three of the variables involved in the experimentation with a PC.
Resumo:
Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.
Resumo:
We present a non linear technique to invert strong motion records with the aim of obtaining the final slip and rupture velocity distributions on the fault plane. In this thesis, the ground motion simulation is obtained evaluating the representation integral in the frequency. The Green’s tractions are computed using the discrete wave-number integration technique that provides the full wave-field in a 1D layered propagation medium. The representation integral is computed through a finite elements technique, based on a Delaunay’s triangulation on the fault plane. The rupture velocity is defined on a coarser regular grid and rupture times are computed by integration of the eikonal equation. For the inversion, the slip distribution is parameterized by 2D overlapping Gaussian functions, which can easily relate the spectrum of the possible solutions with the minimum resolvable wavelength, related to source-station distribution and data processing. The inverse problem is solved by a two-step procedure aimed at separating the computation of the rupture velocity from the evaluation of the slip distribution, the latter being a linear problem, when the rupture velocity is fixed. The non-linear step is solved by optimization of an L2 misfit function between synthetic and real seismograms, and solution is searched by the use of the Neighbourhood Algorithm. The conjugate gradient method is used to solve the linear step instead. The developed methodology has been applied to the M7.2, Iwate Nairiku Miyagi, Japan, earthquake. The estimated magnitude seismic moment is 2.6326 dyne∙cm that corresponds to a moment magnitude MW 6.9 while the mean the rupture velocity is 2.0 km/s. A large slip patch extends from the hypocenter to the southern shallow part of the fault plane. A second relatively large slip patch is found in the northern shallow part. Finally, we gave a quantitative estimation of errors associates with the parameters.
Resumo:
The aim of this thesis is to apply multilevel regression model in context of household surveys. Hierarchical structure in this type of data is characterized by many small groups. In last years comparative and multilevel analysis in the field of perceived health have grown in size. The purpose of this thesis is to develop a multilevel analysis with three level of hierarchy for Physical Component Summary outcome to: evaluate magnitude of within and between variance at each level (individual, household and municipality); explore which covariates affect on perceived physical health at each level; compare model-based and design-based approach in order to establish informativeness of sampling design; estimate a quantile regression for hierarchical data. The target population are the Italian residents aged 18 years and older. Our study shows a high degree of homogeneity within level 1 units belonging from the same group, with an intraclass correlation of 27% in a level-2 null model. Almost all variance is explained by level 1 covariates. In fact, in our model the explanatory variables having more impact on the outcome are disability, unable to work, age and chronic diseases (18 pathologies). An additional analysis are performed by using novel procedure of analysis :"Linear Quantile Mixed Model", named "Multilevel Linear Quantile Regression", estimate. This give us the possibility to describe more generally the conditional distribution of the response through the estimation of its quantiles, while accounting for the dependence among the observations. This has represented a great advantage of our models with respect to classic multilevel regression. The median regression with random effects reveals to be more efficient than the mean regression in representation of the outcome central tendency. A more detailed analysis of the conditional distribution of the response on other quantiles highlighted a differential effect of some covariate along the distribution.
Resumo:
Assessment of the integrity of structural components is of great importance for aerospace systems, land and marine transportation, civil infrastructures and other biological and mechanical applications. Guided waves (GWs) based inspections are an attractive mean for structural health monitoring. In this thesis, the study and development of techniques for GW ultrasound signal analysis and compression in the context of non-destructive testing of structures will be presented. In guided wave inspections, it is necessary to address the problem of the dispersion compensation. A signal processing approach based on frequency warping was adopted. Such operator maps the frequencies axis through a function derived by the group velocity of the test material and it is used to remove the dependence on the travelled distance from the acquired signals. Such processing strategy was fruitfully applied for impact location and damage localization tasks in composite and aluminum panels. It has been shown that, basing on this processing tool, low power embedded system for GW structural monitoring can be implemented. Finally, a new procedure based on Compressive Sensing has been developed and applied for data reduction. Such procedure has also a beneficial effect in enhancing the accuracy of structural defects localization. This algorithm uses the convolutive model of the propagation of ultrasonic guided waves which takes advantage of a sparse signal representation in the warped frequency domain. The recovery from the compressed samples is based on an alternating minimization procedure which achieves both an accurate reconstruction of the ultrasonic signal and a precise estimation of waves time of flight. Such information is used to feed hyperbolic or elliptic localization procedures, for accurate impact or damage localization.
Resumo:
In this study the population structure and connectivity of the Mediterranean and Atlantic Raja clavata (L., 1758) were investigated by analyzing the genetic variation of six population samples (N = 144) at seven nuclear microsatellite loci. The genetic dataset was generated by selecting population samples available in the tissue databases of the GenoDREAM laboratory (University of Bologna) and of the Department of Life Sciences and Environment (University of Cagliari), all collected during past scientific surveys (MEDITS, GRUND) from different geographical locations in the Mediterranean basin and North-east Atlantic sea, as North Sea, Sardinian coasts, Tuscany coasts and Cyprus Island. This thesis deals with to estimate the genetic diversity and differentiation among 6 geographical samples, in particular, to assess the presence of any barrier (geographic, hydrogeological or biological) to gene flow evaluating both the genetic diversity (nucleotide diversity, observed and expected heterozygosity, Hardy- Weinberg equilibrium analysis) and population differentiation (Fst estimates, population structure analysis). In addition to molecular analysis, quantitative representation and statistical analysis of morphological individuals shape are performed using geometric morphometrics methods and statistical tests. Geometric coordinates call landmarks are fixed in 158 individuals belonging to two population samples of Raja clavata and in population samples of closely related species, Raja straeleni (cryptic sibling) and Raja asterias, to assess significant morphological differences at multiple taxonomic levels. The results obtained from the analysis of the microsatellite dataset suggested a geographic and genetic separation between populations from Central-Western and Eastern Mediterranean basins. Furthermore, the analysis also showed that there was no separation between geographic samples from North Atlantic Ocean and central-Western Mediterranean, grouping them to a panmictic population. The Landmark-based geometric morphometry method results showed significant differences of body shape able to discriminate taxa at tested levels (from species to populations).
Resumo:
Local to regional climate anomalies are to a large extent determined by the state of the atmospheric circulation. The knowledge of large-scale sea level pressure (SLP) variations in former times is therefore crucial when addressing past climate changes across Europe and the Mediterranean. However, currently available SLP reconstructions lack data from the ocean, particularly in the pre-1850 period. Here we present a new statistically-derived 5° × 5° resolved gridded seasonal SLP dataset covering the eastern North Atlantic, Europe and the Mediterranean area (40°W–50°E; 20°N–70°N) back to 1750 using terrestrial instrumental pressure series and marine wind information from ship logbooks. For the period 1750–1850, the new SLP reconstruction provides a more accurate representation of the strength of the winter westerlies as well as the location and variability of the Azores High than currently available multiproxy pressure field reconstructions. These findings strongly support the potential of ship logbooks as an important source to determine past circulation variations especially for the pre-1850 period. This new dataset can be further used for dynamical studies relating large-scale atmospheric circulation to temperature and precipitation variability over the Mediterranean and Eurasia, for the comparison with outputs from GCMs as well as for detection and attribution studies.