807 resultados para Information Model
Resumo:
The need for a convergence between semi-structured data management and Information Retrieval techniques is manifest to the scientific community. In order to fulfil this growing request, W3C has recently proposed XQuery Full Text, an IR-oriented extension of XQuery. However, the issue of query optimization requires the study of important properties like query equivalence and containment; to this aim, a formal representation of document and queries is needed. The goal of this thesis is to establish such formal background. We define a data model for XML documents and propose an algebra able to represent most of XQuery Full-Text expressions. We show how an XQuery Full-Text expression can be translated into an algebraic expression and how an algebraic expression can be optimized.
Resumo:
This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress).
Resumo:
This thesis describes modelling tools and methods suited for complex systems (systems that typically are represented by a plurality of models). The basic idea is that all models representing the system should be linked by well-defined model operations in order to build a structured repository of information, a hierarchy of models. The port-Hamiltonian framework is a good candidate to solve this kind of problems as it supports the most important model operations natively. The thesis in particular addresses the problem of integrating distributed parameter systems in a model hierarchy, and shows two possible mechanisms to do that: a finite-element discretization in port-Hamiltonian form, and a structure-preserving model order reduction for discretized models obtainable from commercial finite-element packages.
Resumo:
The presented study carried out an analysis on rural landscape changes. In particular the study focuses on the understanding of driving forces acting on the rural built environment using a statistical spatial model implemented through GIS techniques. It is well known that the study of landscape changes is essential for a conscious decision making in land planning. From a bibliography review results a general lack of studies dealing with the modeling of rural built environment and hence a theoretical modelling approach for such purpose is needed. The advancement in technology and modernity in building construction and agriculture have gradually changed the rural built environment. In addition, the phenomenon of urbanization of a determined the construction of new volumes that occurred beside abandoned or derelict rural buildings. Consequently there are two types of transformation dynamics affecting mainly the rural built environment that can be observed: the conversion of rural buildings and the increasing of building numbers. It is the specific aim of the presented study to propose a methodology for the development of a spatial model that allows the identification of driving forces that acted on the behaviours of the building allocation. In fact one of the most concerning dynamic nowadays is related to an irrational expansion of buildings sprawl across landscape. The proposed methodology is composed by some conceptual steps that cover different aspects related to the development of a spatial model: the selection of a response variable that better describe the phenomenon under study, the identification of possible driving forces, the sampling methodology concerning the collection of data, the most suitable algorithm to be adopted in relation to statistical theory and method used, the calibration process and evaluation of the model. A different combination of factors in various parts of the territory generated favourable or less favourable conditions for the building allocation and the existence of buildings represents the evidence of such optimum. Conversely the absence of buildings expresses a combination of agents which is not suitable for building allocation. Presence or absence of buildings can be adopted as indicators of such driving conditions, since they represent the expression of the action of driving forces in the land suitability sorting process. The existence of correlation between site selection and hypothetical driving forces, evaluated by means of modeling techniques, provides an evidence of which driving forces are involved in the allocation dynamic and an insight on their level of influence into the process. GIS software by means of spatial analysis tools allows to associate the concept of presence and absence with point futures generating a point process. Presence or absence of buildings at some site locations represent the expression of these driving factors interaction. In case of presences, points represent locations of real existing buildings, conversely absences represent locations were buildings are not existent and so they are generated by a stochastic mechanism. Possible driving forces are selected and the existence of a causal relationship with building allocations is assessed through a spatial model. The adoption of empirical statistical models provides a mechanism for the explanatory variable analysis and for the identification of key driving variables behind the site selection process for new building allocation. The model developed by following the methodology is applied to a case study to test the validity of the methodology. In particular the study area for the testing of the methodology is represented by the New District of Imola characterized by a prevailing agricultural production vocation and were transformation dynamic intensively occurred. The development of the model involved the identification of predictive variables (related to geomorphologic, socio-economic, structural and infrastructural systems of landscape) capable of representing the driving forces responsible for landscape changes.. The calibration of the model is carried out referring to spatial data regarding the periurban and rural area of the study area within the 1975-2005 time period by means of Generalised linear model. The resulting output from the model fit is continuous grid surface where cells assume values ranged from 0 to 1 of probability of building occurrences along the rural and periurban area of the study area. Hence the response variable assesses the changes in the rural built environment occurred in such time interval and is correlated to the selected explanatory variables by means of a generalized linear model using logistic regression. Comparing the probability map obtained from the model to the actual rural building distribution in 2005, the interpretation capability of the model can be evaluated. The proposed model can be also applied to the interpretation of trends which occurred in other study areas, and also referring to different time intervals, depending on the availability of data. The use of suitable data in terms of time, information, and spatial resolution and the costs related to data acquisition, pre-processing, and survey are among the most critical aspects of model implementation. Future in-depth studies can focus on using the proposed model to predict short/medium-range future scenarios for the rural built environment distribution in the study area. In order to predict future scenarios it is necessary to assume that the driving forces do not change and that their levels of influence within the model are not far from those assessed for the time interval used for the calibration.
Resumo:
The aim of the thesi is to formulate a suitable Item Response Theory (IRT) based model to measure HRQoL (as latent variable) using a mixed responses questionnaire and relaxing the hypothesis of normal distributed latent variable. The new model is a combination of two models already presented in literature, that is, a latent trait model for mixed responses and an IRT model for Skew Normal latent variable. It is developed in a Bayesian framework, a Markov chain Monte Carlo procedure is used to generate samples of the posterior distribution of the parameters of interest. The proposed model is test on a questionnaire composed by 5 discrete items and one continuous to measure HRQoL in children, the EQ-5D-Y questionnaire. A large sample of children collected in the schools was used. In comparison with a model for only discrete responses and a model for mixed responses and normal latent variable, the new model has better performances, in term of deviance information criterion (DIC), chain convergences times and precision of the estimates.
Resumo:
The central objective of research in Information Retrieval (IR) is to discover new techniques to retrieve relevant information in order to satisfy an Information Need. The Information Need is satisfied when relevant information can be provided to the user. In IR, relevance is a fundamental concept which has changed over time, from popular to personal, i.e., what was considered relevant before was information for the whole population, but what is considered relevant now is specific information for each user. Hence, there is a need to connect the behavior of the system to the condition of a particular person and his social context; thereby an interdisciplinary sector called Human-Centered Computing was born. For the modern search engine, the information extracted for the individual user is crucial. According to the Personalized Search (PS), two different techniques are necessary to personalize a search: contextualization (interconnected conditions that occur in an activity), and individualization (characteristics that distinguish an individual). This movement of focus to the individual's need undermines the rigid linearity of the classical model overtaken the ``berry picking'' model which explains that the terms change thanks to the informational feedback received from the search activity introducing the concept of evolution of search terms. The development of Information Foraging theory, which observed the correlations between animal foraging and human information foraging, also contributed to this transformation through attempts to optimize the cost-benefit ratio. This thesis arose from the need to satisfy human individuality when searching for information, and it develops a synergistic collaboration between the frontiers of technological innovation and the recent advances in IR. The search method developed exploits what is relevant for the user by changing radically the way in which an Information Need is expressed, because now it is expressed through the generation of the query and its own context. As a matter of fact the method was born under the pretense to improve the quality of search by rewriting the query based on the contexts automatically generated from a local knowledge base. Furthermore, the idea of optimizing each IR system has led to develop it as a middleware of interaction between the user and the IR system. Thereby the system has just two possible actions: rewriting the query, and reordering the result. Equivalent actions to the approach was described from the PS that generally exploits information derived from analysis of user behavior, while the proposed approach exploits knowledge provided by the user. The thesis went further to generate a novel method for an assessment procedure, according to the "Cranfield paradigm", in order to evaluate this type of IR systems. The results achieved are interesting considering both the effectiveness achieved and the innovative approach undertaken together with the several applications inspired using a local knowledge base.
Resumo:
In the present work we perform an econometric analysis of the Tribal art market. To this aim, we use a unique and original database that includes information on Tribal art market auctions worldwide from 1998 to 2011. In Literature, art prices are modelled through the hedonic regression model, a classic fixed-effect model. The main drawback of the hedonic approach is the large number of parameters, since, in general, art data include many categorical variables. In this work, we propose a multilevel model for the analysis of Tribal art prices that takes into account the influence of time on artwork prices. In fact, it is natural to assume that time exerts an influence over the price dynamics in various ways. Nevertheless, since the set of objects change at every auction date, we do not have repeated measurements of the same items over time. Hence, the dataset does not constitute a proper panel; rather, it has a two-level structure in that items, level-1 units, are grouped in time points, level-2 units. The main theoretical contribution is the extension of classical multilevel models to cope with the case described above. In particular, we introduce a model with time dependent random effects at the second level. We propose a novel specification of the model, derive the maximum likelihood estimators and implement them through the E-M algorithm. We test the finite sample properties of the estimators and the validity of the own-written R-code by means of a simulation study. Finally, we show that the new model improves considerably the fit of the Tribal art data with respect to both the hedonic regression model and the classic multilevel model.
Resumo:
The objective of this work is to characterize the genome of the chromosome 1 of A.thaliana, a small flowering plants used as a model organism in studies of biology and genetics, on the basis of a recent mathematical model of the genetic code. I analyze and compare different portions of the genome: genes, exons, coding sequences (CDS), introns, long introns, intergenes, untranslated regions (UTR) and regulatory sequences. In order to accomplish the task, I transformed nucleotide sequences into binary sequences based on the definition of the three different dichotomic classes. The descriptive analysis of binary strings indicate the presence of regularities in each portion of the genome considered. In particular, there are remarkable differences between coding sequences (CDS and exons) and non-coding sequences, suggesting that the frame is important only for coding sequences and that dichotomic classes can be useful to recognize them. Then, I assessed the existence of short-range dependence between binary sequences computed on the basis of the different dichotomic classes. I used three different measures of dependence: the well-known chi-squared test and two indices derived from the concept of entropy i.e. Mutual Information (MI) and Sρ, a normalized version of the “Bhattacharya Hellinger Matusita distance”. The results show that there is a significant short-range dependence structure only for the coding sequences whose existence is a clue of an underlying error detection and correction mechanism. No doubt, further studies are needed in order to assess how the information carried by dichotomic classes could discriminate between coding and noncoding sequence and, therefore, contribute to unveil the role of the mathematical structure in error detection and correction mechanisms. Still, I have shown the potential of the approach presented for understanding the management of genetic information.
Resumo:
The motivating problem concerns the estimation of the growth curve of solitary corals that follow the nonlinear Von Bertalanffy Growth Function (VBGF). The most common parameterization of the VBGF for corals is based on two parameters: the ultimate length L∞ and the growth rate k. One aim was to find a more reliable method for estimating these parameters, which can capture the influence of environmental covariates. The main issue with current methods is that they force the linearization of VBGF and neglect intra-individual variability. The idea was to use the hierarchical nonlinear model which has the appealing features of taking into account the influence of collection sites, possible intra-site measurement correlation and variance heterogeneity, and that can handle the influence of environmental factors and all the reliable information that might influence coral growth. This method was used on two databases of different solitary corals i.e. Balanophyllia europaea and Leptopsammia pruvoti, collected in six different sites in different environmental conditions, which introduced a decisive improvement in the results. Nevertheless, the theory of the energy balance in growth ascertains the linear correlation of the two parameters and the independence of the ultimate length L∞ from the influence of environmental covariates, so a further aim of the thesis was to propose a new parameterization based on the ultimate length and parameter c which explicitly describes the part of growth ascribable to site-specific conditions such as environmental factors. We explored the possibility of estimating these parameters characterizing the VBGF new parameterization via the nonlinear hierarchical model. Again there was a general improvement with respect to traditional methods. The results of the two parameterizations were similar, although a very slight improvement was observed in the new one. This is, nevertheless, more suitable from a theoretical point of view when considering environmental covariates.
Resumo:
In the last few years the resolution of numerical weather prediction (nwp) became higher and higher with the progresses of technology and knowledge. As a consequence, a great number of initial data became fundamental for a correct initialization of the models. The potential of radar observations has long been recognized for improving the initial conditions of high-resolution nwp models, while operational application becomes more frequent. The fact that many nwp centres have recently taken into operations convection-permitting forecast models, many of which assimilate radar data, emphasizes the need for an approach to providing quality information which is needed in order to avoid that radar errors degrade the model's initial conditions and, therefore, its forecasts. Environmental risks can can be related with various causes: meteorological, seismical, hydrological/hydraulic. Flash floods have horizontal dimension of 1-20 Km and can be inserted in mesoscale gamma subscale, this scale can be modeled only with nwp model with the highest resolution as the COSMO-2 model. One of the problems of modeling extreme convective events is related with the atmospheric initial conditions, in fact the scale dimension for the assimilation of atmospheric condition in an high resolution model is about 10 Km, a value too high for a correct representation of convection initial conditions. Assimilation of radar data with his resolution of about of Km every 5 or 10 minutes can be a solution for this problem. In this contribution a pragmatic and empirical approach to deriving a radar data quality description is proposed to be used in radar data assimilation and more specifically for the latent heat nudging (lhn) scheme. Later the the nvective capabilities of the cosmo-2 model are investigated through some case studies. Finally, this work shows some preliminary experiments of coupling of a high resolution meteorological model with an Hydrological one.
Resumo:
This thesis presents a universal model of documents and deltas. This model formalize what it means to find differences between documents and to shows a single shared formalization that can be used by any algorithm to describe the differences found between any kind of comparable documents. The main scientific contribution of this thesis is a universal delta model that can be used to represent the changes found by an algorithm. The main part of this model are the formal definition of changes (the pieces of information that records that something has changed), operations (the definitions of the kind of change that happened) and deltas (coherent summaries of what has changed between two documents). The fundamental mechanism tha makes the universal delta model a very expressive tool is the use of encapsulation relations between changes. In the universal delta model, changes are not always simple records of what has changed, they can also be combined into more complex changes that reflects the detection of more meaningful modifications. In addition to the main entities (i.e., changes, operations and deltas), the model describes and defines also documents and the concept of equivalence between documents. As a corollary to the model, there is also an extensible catalog of possible operations that algorithms can detect, used to create a common library of operations, and an UML serialization of the model, useful as a reference when implementing APIs that deal with deltas. The universal delta model presented in this thesis acts as the formal groundwork upon which algorithm can be based and libraries can be implemented. It removes the need to recreate a new delta model and terminology whenever a new algorithm is devised. It also alleviates the problems that toolmakers have when adapting their software to new diff algorithms.
Resumo:
The chronic myeloid leukemia complexity and the difficulties of disease eradication have recently led to the development of drugs which, together with the inhibitors of TK, could eliminate leukemia stem cells preventing the occurrence of relapses in patients undergoing transplantation. The Hedgehog (Hh) signaling pathway positively regulates the self-renewal and the maintenance of leukemic stem cells and not, and this function is evolutionarily conserved. Using Drosophila as a model, we studied the efficacy of the SMO inhibitor drug that inhibit the human protein Smoothened (SMO). SMO is a crucial component in the signal transduction of Hh and its blockade in mammals leads to a reduction in the disease induction. Here we show that administration of the SMO inhibitor to animals has a specific effect directed against the Drosophila ortholog protein, causing loss of quiescence and hematopoietic precursors mobilization. The SMO inhibitor induces in L3 larvae the appearance of melanotic nodules generated as response by Drosophila immune system to the increase of its hemocytes. The same phenotype is induced even by the dsRNA:SMO specific expression in hematopoietic precursors of the lymph gland. The drug action is also confirmed at cellular level. The study of molecular markers has allowed us to demonstrate that SMO inhibitor leads to a reduction of the quiescent precursors and to an increase of the differentiated cells. Moreover administering the inhibitor to heterozygous for a null allele of Smo, we observe a significant increase in the phenotype penetrance compared to administration to wild type animals. This helps to confirm the specific effect of the drug itself. These data taken together indicate that the study of inhibitors of Smo in Drosophila can represent a useful way to dissect their action mechanism at the molecular-genetic level in order to collect information applicable to the studies of the disease in humans.
Resumo:
The atmosphere is a global influence on the movement of heat and humidity between the continents, and thus significantly affects climate variability. Information about atmospheric circulation are of major importance for the understanding of different climatic conditions. Dust deposits from maar lakes and dry maars from the Eifel Volcanic Field (Germany) are therefore used as proxy data for the reconstruction of past aeolian dynamics.rnrnIn this thesis past two sediment cores from the Eifel region are examined: the core SM3 from Lake Schalkenmehren and the core DE3 from the Dehner dry maar. Both cores contain the tephra of the Laacher See eruption, which is dated to 12,900 before present. Taken together the cores cover the last 60,000 years: SM3 the Holocene and DE3 the marine isotope stages MIS-3 and MIS-2, respectively. The frequencies of glacial dust storm events and their paleo wind direction are detected by high resolution grain size and provenance analysis of the lake sediments. Therefore two different methods are applied: geochemical measurements of the sediment using µXRF-scanning and the particle analysis method RADIUS (rapid particle analysis of digital images by ultra-high-resolution scanning of thin sections).rnIt is shown that single dust layers in the lake sediment are characterized by an increased content of aeolian transported carbonate particles. The limestone-bearing Eifel-North-South zone is the most likely source for the carbonate rich aeolian dust in the lake sediments of the Dehner dry maar. The dry maar is located on the western side of the Eifel-North-South zone. Thus, carbonate rich aeolian sediment is most likely to be transported towards the Dehner dry maar within easterly winds. A methodology is developed which limits the detection to the aeolian transported carbonate particles in the sediment, the RADIUS-carbonate module.rnrnIn summary, during the marine isotope stage MIS-3 the storm frequency and the east wind frequency are both increased in comparison to MIS-2. These results leads to the suggestion that atmospheric circulation was affected by more turbulent conditions during MIS-3 in comparison to the more stable atmospheric circulation during the full glacial conditions of MIS-2.rnThe results of the investigations of the dust records are finally evaluated in relation a study of atmospheric general circulation models for a comprehensive interpretation. Here, AGCM experiments (ECHAM3 and ECHAM4) with different prescribed SST patterns are used to develop a synoptic interpretation of long-persisting east wind conditions and of east wind storm events, which are suggested to lead to an enhanced accumulation of sediment being transported by easterly winds to the proxy site of the Dehner dry maar.rnrnThe basic observations made on the proxy record are also illustrated in the 10 m-wind vectors in the different model experiments under glacial conditions with different prescribed sea surface temperature patterns. Furthermore, the analysis of long-persisting east wind conditions in the AGCM data shows a stronger seasonality under glacial conditions: all the different experiments are characterized by an increase of the relative importance of the LEWIC during spring and summer. The different glacial experiments consistently show a shift from a long-lasting high over the Baltic Sea towards the NW, directly above the Scandinavian Ice Sheet, together with contemporary enhanced westerly circulation over the North Atlantic.rnrnThis thesis is a comprehensive analysis of atmospheric circulation patterns during the last glacial period. It has been possible to reconstruct important elements of the glacial paleo climate in Central Europe. While the proxy data from sediment cores lead to a binary signal of the wind direction changes (east versus west wind), a synoptic interpretation using atmospheric circulation models is successful. This shows a possible distribution of high and low pressure areas and thus the direction and strength of wind fields which have the capacity to transport dust. In conclusion, the combination of numerical models, to enhance understanding of processes in the climate system, with proxy data from the environmental record is the key to a comprehensive approach to paleo climatic reconstruction.rn
Resumo:
Spatial prediction of hourly rainfall via radar calibration is addressed. The change of support problem (COSP), arising when the spatial supports of different data sources do not coincide, is faced in a non-Gaussian setting; in fact, hourly rainfall in Emilia-Romagna region, in Italy, is characterized by abundance of zero values and right-skeweness of the distribution of positive amounts. Rain gauge direct measurements on sparsely distributed locations and hourly cumulated radar grids are provided by the ARPA-SIMC Emilia-Romagna. We propose a three-stage Bayesian hierarchical model for radar calibration, exploiting rain gauges as reference measure. Rain probability and amounts are modeled via linear relationships with radar in the log scale; spatial correlated Gaussian effects capture the residual information. We employ a probit link for rainfall probability and Gamma distribution for rainfall positive amounts; the two steps are joined via a two-part semicontinuous model. Three model specifications differently addressing COSP are presented; in particular, a stochastic weighting of all radar pixels, driven by a latent Gaussian process defined on the grid, is employed. Estimation is performed via MCMC procedures implemented in C, linked to R software. Communication and evaluation of probabilistic, point and interval predictions is investigated. A non-randomized PIT histogram is proposed for correctly assessing calibration and coverage of two-part semicontinuous models. Predictions obtained with the different model specifications are evaluated via graphical tools (Reliability Plot, Sharpness Histogram, PIT Histogram, Brier Score Plot and Quantile Decomposition Plot), proper scoring rules (Brier Score, Continuous Rank Probability Score) and consistent scoring functions (Root Mean Square Error and Mean Absolute Error addressing the predictive mean and median, respectively). Calibration is reached and the inclusion of neighbouring information slightly improves predictions. All specifications outperform a benchmark model with incorrelated effects, confirming the relevance of spatial correlation for modeling rainfall probability and accumulation.
Resumo:
The scope of this project is to study the effectiveness of building information modelling (BIM) in performing life cycle assessment in a building. For the purposes of the study will be used “Revit” which is a BIM software and Tally which is an LCA tool integrated in Revit. The project is divided in six chapters. The first chapter consists of a theoretical introduction into building information modelling and its connection to life cycle assessment. The second chapter describes the characteristics of building information modelling (BIM). In addition, a comparison has been made with the traditional architectural, engineering and construction business model and the benefits to shift into BIM. In the third chapter it will be a review of the most well-known and available BIM software in the market. In chapter four life cycle assessment (LCA) will be described in general and later on specifically for the purpose of the case study that will be used in the following chapter. Moreover, the tools that are available to perform an LCA will be reviewed. Chapter five will present the case study that consists of a model in a BIM software (Revit) and the LCA performed by Tally, an LCA tool integrated into Revit. In the last chapter will be a discussion of the results that were obtained, the limitation and the possible future improvement in performing life cycle assessment (LCA) in a BIM model.