13 resultados para Statistical Language Model

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

90.00% 90.00%

Publicador:

Resumo:

The presented study carried out an analysis on rural landscape changes. In particular the study focuses on the understanding of driving forces acting on the rural built environment using a statistical spatial model implemented through GIS techniques. It is well known that the study of landscape changes is essential for a conscious decision making in land planning. From a bibliography review results a general lack of studies dealing with the modeling of rural built environment and hence a theoretical modelling approach for such purpose is needed. The advancement in technology and modernity in building construction and agriculture have gradually changed the rural built environment. In addition, the phenomenon of urbanization of a determined the construction of new volumes that occurred beside abandoned or derelict rural buildings. Consequently there are two types of transformation dynamics affecting mainly the rural built environment that can be observed: the conversion of rural buildings and the increasing of building numbers. It is the specific aim of the presented study to propose a methodology for the development of a spatial model that allows the identification of driving forces that acted on the behaviours of the building allocation. In fact one of the most concerning dynamic nowadays is related to an irrational expansion of buildings sprawl across landscape. The proposed methodology is composed by some conceptual steps that cover different aspects related to the development of a spatial model: the selection of a response variable that better describe the phenomenon under study, the identification of possible driving forces, the sampling methodology concerning the collection of data, the most suitable algorithm to be adopted in relation to statistical theory and method used, the calibration process and evaluation of the model. A different combination of factors in various parts of the territory generated favourable or less favourable conditions for the building allocation and the existence of buildings represents the evidence of such optimum. Conversely the absence of buildings expresses a combination of agents which is not suitable for building allocation. Presence or absence of buildings can be adopted as indicators of such driving conditions, since they represent the expression of the action of driving forces in the land suitability sorting process. The existence of correlation between site selection and hypothetical driving forces, evaluated by means of modeling techniques, provides an evidence of which driving forces are involved in the allocation dynamic and an insight on their level of influence into the process. GIS software by means of spatial analysis tools allows to associate the concept of presence and absence with point futures generating a point process. Presence or absence of buildings at some site locations represent the expression of these driving factors interaction. In case of presences, points represent locations of real existing buildings, conversely absences represent locations were buildings are not existent and so they are generated by a stochastic mechanism. Possible driving forces are selected and the existence of a causal relationship with building allocations is assessed through a spatial model. The adoption of empirical statistical models provides a mechanism for the explanatory variable analysis and for the identification of key driving variables behind the site selection process for new building allocation. The model developed by following the methodology is applied to a case study to test the validity of the methodology. In particular the study area for the testing of the methodology is represented by the New District of Imola characterized by a prevailing agricultural production vocation and were transformation dynamic intensively occurred. The development of the model involved the identification of predictive variables (related to geomorphologic, socio-economic, structural and infrastructural systems of landscape) capable of representing the driving forces responsible for landscape changes.. The calibration of the model is carried out referring to spatial data regarding the periurban and rural area of the study area within the 1975-2005 time period by means of Generalised linear model. The resulting output from the model fit is continuous grid surface where cells assume values ranged from 0 to 1 of probability of building occurrences along the rural and periurban area of the study area. Hence the response variable assesses the changes in the rural built environment occurred in such time interval and is correlated to the selected explanatory variables by means of a generalized linear model using logistic regression. Comparing the probability map obtained from the model to the actual rural building distribution in 2005, the interpretation capability of the model can be evaluated. The proposed model can be also applied to the interpretation of trends which occurred in other study areas, and also referring to different time intervals, depending on the availability of data. The use of suitable data in terms of time, information, and spatial resolution and the costs related to data acquisition, pre-processing, and survey are among the most critical aspects of model implementation. Future in-depth studies can focus on using the proposed model to predict short/medium-range future scenarios for the rural built environment distribution in the study area. In order to predict future scenarios it is necessary to assume that the driving forces do not change and that their levels of influence within the model are not far from those assessed for the time interval used for the calibration.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The rapid progression of biomedical research coupled with the explosion of scientific literature has generated an exigent need for efficient and reliable systems of knowledge extraction. This dissertation contends with this challenge through a concentrated investigation of digital health, Artificial Intelligence, and specifically Machine Learning and Natural Language Processing's (NLP) potential to expedite systematic literature reviews and refine the knowledge extraction process. The surge of COVID-19 complicated the efforts of scientists, policymakers, and medical professionals in identifying pertinent articles and assessing their scientific validity. This thesis presents a substantial solution in the form of the COKE Project, an initiative that interlaces machine reading with the rigorous protocols of Evidence-Based Medicine to streamline knowledge extraction. In the framework of the COKE (“COVID-19 Knowledge Extraction framework for next-generation discovery science”) Project, this thesis aims to underscore the capacity of machine reading to create knowledge graphs from scientific texts. The project is remarkable for its innovative use of NLP techniques such as a BERT + bi-LSTM language model. This combination is employed to detect and categorize elements within medical abstracts, thereby enhancing the systematic literature review process. The COKE project's outcomes show that NLP, when used in a judiciously structured manner, can significantly reduce the time and effort required to produce medical guidelines. These findings are particularly salient during times of medical emergency, like the COVID-19 pandemic, when quick and accurate research results are critical.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Galaxy clusters occupy a special position in the cosmic hierarchy as they are the largest bound structures in the Universe. There is now general agreement on a hierarchical picture for the formation of cosmic structures, in which galaxy clusters are supposed to form by accretion of matter and merging between smaller units. During merger events, shocks are driven by the gravity of the dark matter in the diffuse barionic component, which is heated up to the observed temperature. Radio and hard-X ray observations have discovered non-thermal components mixed with the thermal Intra Cluster Medium (ICM) and this is of great importance as it calls for a “revision” of the physics of the ICM. The bulk of present information comes from the radio observations which discovered an increasing number of Mpcsized emissions from the ICM, Radio Halos (at the cluster center) and Radio Relics (at the cluster periphery). These sources are due to synchrotron emission from ultra relativistic electrons diffusing through µG turbulent magnetic fields. Radio Halos are the most spectacular evidence of non-thermal components in the ICM and understanding the origin and evolution of these sources represents one of the most challenging goal of the theory of the ICM. Cluster mergers are the most energetic events in the Universe and a fraction of the energy dissipated during these mergers could be channelled into the amplification of the magnetic fields and into the acceleration of high energy particles via shocks and turbulence driven by these mergers. Present observations of Radio Halos (and possibly of hard X-rays) can be best interpreted in terms of the reacceleration scenario in which MHD turbulence injected during these cluster mergers re-accelerates high energy particles in the ICM. The physics involved in this scenario is very complex and model details are difficult to test, however this model clearly predicts some simple properties of Radio Halos (and resulting IC emission in the hard X-ray band) which are almost independent of the details of the adopted physics. In particular in the re-acceleration scenario MHD turbulence is injected and dissipated during cluster mergers and thus Radio Halos (and also the resulting hard X-ray IC emission) should be transient phenomena (with a typical lifetime <» 1 Gyr) associated with dynamically disturbed clusters. The physics of the re-acceleration scenario should produce an unavoidable cut-off in the spectrum of the re-accelerated electrons, which is due to the balance between turbulent acceleration and radiative losses. The energy at which this cut-off occurs, and thus the maximum frequency at which synchrotron radiation is produced, depends essentially on the efficiency of the acceleration mechanism so that observations at high frequencies are expected to catch only the most efficient phenomena while, in principle, low frequency radio surveys may found these phenomena much common in the Universe. These basic properties should leave an important imprint in the statistical properties of Radio Halos (and of non-thermal phenomena in general) which, however, have not been addressed yet by present modellings. The main focus of this PhD thesis is to calculate, for the first time, the expected statistics of Radio Halos in the context of the re-acceleration scenario. In particular, we shall address the following main questions: • Is it possible to model “self-consistently” the evolution of these sources together with that of the parent clusters? • How the occurrence of Radio Halos is expected to change with cluster mass and to evolve with redshift? How the efficiency to catch Radio Halos in galaxy clusters changes with the observing radio frequency? • How many Radio Halos are expected to form in the Universe? At which redshift is expected the bulk of these sources? • Is it possible to reproduce in the re-acceleration scenario the observed occurrence and number of Radio Halos in the Universe and the observed correlations between thermal and non-thermal properties of galaxy clusters? • Is it possible to constrain the magnetic field intensity and profile in galaxy clusters and the energetic of turbulence in the ICM from the comparison between model expectations and observations? Several astrophysical ingredients are necessary to model the evolution and statistical properties of Radio Halos in the context of re-acceleration model and to address the points given above. For these reason we deserve some space in this PhD thesis to review the important aspects of the physics of the ICM which are of interest to catch our goals. In Chapt. 1 we discuss the physics of galaxy clusters, and in particular, the clusters formation process; in Chapt. 2 we review the main observational properties of non-thermal components in the ICM; and in Chapt. 3 we focus on the physics of magnetic field and of particle acceleration in galaxy clusters. As a relevant application, the theory of Alfv´enic particle acceleration is applied in Chapt. 4 where we report the most important results from calculations we have done in the framework of the re-acceleration scenario. In this Chapter we show that a fraction of the energy of fluid turbulence driven in the ICM by the cluster mergers can be channelled into the injection of Alfv´en waves at small scales and that these waves can efficiently re-accelerate particles and trigger Radio Halos and hard X-ray emission. The main part of this PhD work, the calculation of the statistical properties of Radio Halos and non-thermal phenomena as expected in the context of the re-acceleration model and their comparison with observations, is presented in Chapts.5, 6, 7 and 8. In Chapt.5 we present a first approach to semi-analytical calculations of statistical properties of giant Radio Halos. The main goal of this Chapter is to model cluster formation, the injection of turbulence in the ICM and the resulting particle acceleration process. We adopt the semi–analytic extended Press & Schechter (PS) theory to follow the formation of a large synthetic population of galaxy clusters and assume that during a merger a fraction of the PdV work done by the infalling subclusters in passing through the most massive one is injected in the form of magnetosonic waves. Then the processes of stochastic acceleration of the relativistic electrons by these waves and the properties of the ensuing synchrotron (Radio Halos) and inverse Compton (IC, hard X-ray) emission of merging clusters are computed under the assumption of a constant rms average magnetic field strength in emitting volume. The main finding of these calculations is that giant Radio Halos are naturally expected only in the more massive clusters, and that the expected fraction of clusters with Radio Halos is consistent with the observed one. In Chapt. 6 we extend the previous calculations by including a scaling of the magnetic field strength with cluster mass. The inclusion of this scaling allows us to derive the expected correlations between the synchrotron radio power of Radio Halos and the X-ray properties (T, LX) and mass of the hosting clusters. For the first time, we show that these correlations, calculated in the context of the re-acceleration model, are consistent with the observed ones for typical µG strengths of the average B intensity in massive clusters. The calculations presented in this Chapter allow us to derive the evolution of the probability to form Radio Halos as a function of the cluster mass and redshift. The most relevant finding presented in this Chapter is that the luminosity functions of giant Radio Halos at 1.4 GHz are expected to peak around a radio power » 1024 W/Hz and to flatten (or cut-off) at lower radio powers because of the decrease of the electron re-acceleration efficiency in smaller galaxy clusters. In Chapt. 6 we also derive the expected number counts of Radio Halos and compare them with available observations: we claim that » 100 Radio Halos in the Universe can be observed at 1.4 GHz with deep surveys, while more than 1000 Radio Halos are expected to be discovered in the next future by LOFAR at 150 MHz. This is the first (and so far unique) model expectation for the number counts of Radio Halos at lower frequency and allows to design future radio surveys. Based on the results of Chapt. 6, in Chapt.7 we present a work in progress on a “revision” of the occurrence of Radio Halos. We combine past results from the NVSS radio survey (z » 0.05 − 0.2) with our ongoing GMRT Radio Halos Pointed Observations of 50 X-ray luminous galaxy clusters (at z » 0.2−0.4) and discuss the possibility to test our model expectations with the number counts of Radio Halos at z » 0.05 − 0.4. The most relevant limitation in the calculations presented in Chapt. 5 and 6 is the assumption of an “averaged” size of Radio Halos independently of their radio luminosity and of the mass of the parent clusters. This assumption cannot be released in the context of the PS formalism used to describe the formation process of clusters, while a more detailed analysis of the physics of cluster mergers and of the injection process of turbulence in the ICM would require an approach based on numerical (possible MHD) simulations of a very large volume of the Universe which is however well beyond the aim of this PhD thesis. On the other hand, in Chapt.8 we report our discovery of novel correlations between the size (RH) of Radio Halos and their radio power and between RH and the cluster mass within the Radio Halo region, MH. In particular this last “geometrical” MH − RH correlation allows us to “observationally” overcome the limitation of the “average” size of Radio Halos. Thus in this Chapter, by making use of this “geometrical” correlation and of a simplified form of the re-acceleration model based on the results of Chapt. 5 and 6 we are able to discuss expected correlations between the synchrotron power and the thermal cluster quantities relative to the radio emitting region. This is a new powerful tool of investigation and we show that all the observed correlations (PR − RH, PR − MH, PR − T, PR − LX, . . . ) now become well understood in the context of the re-acceleration model. In addition, we find that observationally the size of Radio Halos scales non-linearly with the virial radius of the parent cluster, and this immediately means that the fraction of the cluster volume which is radio emitting increases with cluster mass and thus that the non-thermal component in clusters is not self-similar.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This thesis proposes a new document model, according to which any document can be segmented in some independent components and transformed in a pattern-based projection, that only uses a very small set of objects and composition rules. The point is that such a normalized document expresses the same fundamental information of the original one, in a simple, clear and unambiguous way. The central part of my work consists of discussing that model, investigating how a digital document can be segmented, and how a segmented version can be used to implement advanced tools of conversion. I present seven patterns which are versatile enough to capture the most relevant documents’ structures, and whose minimality and rigour make that implementation possible. The abstract model is then instantiated into an actual markup language, called IML. IML is a general and extensible language, which basically adopts an XHTML syntax, able to capture a posteriori the only content of a digital document. It is compared with other languages and proposals, in order to clarify its role and objectives. Finally, I present some systems built upon these ideas. These applications are evaluated in terms of users’ advantages, workflow improvements and impact over the overall quality of the output. In particular, they cover heterogeneous content management processes: from web editing to collaboration (IsaWiki and WikiFactory), from e-learning (IsaLearning) to professional printing (IsaPress).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aim: To evaluate the early response to treatment to an antiangiogenetic drug (sorafenib) in a heterotopic murine model of hepatocellular carcinoma (HCC) using ultrasonographic molecular imaging. Material and Methods: the xenographt model was established injecting a suspension of HuH7 cells subcutaneously in 19 nude mice. When tumors reached a mean diameter of 5-10 mm, they were divided in two groups (treatment and vehicle). The treatment group received sorafenib (62 mg/kg) by daily oral gavage for 14 days. Molecular imaging was performed using contrast enhanced ultrasound (CEUS), by injecting into the mouse venous circulation a suspension of VEGFR-2 targeted microbubbles (BR55, kind gift of Bracco Swiss, Geneve, Switzerland). Video clips were acquired for 6 minutes, then microbubbles (MBs) were destroyed by a high mechanical index (MI) impulse, and another minute was recorded to evaluate residual circulating MBs. The US protocol was repeated at day 0,+2,+4,+7, and +14 from the beginning of treatment administration. Video clips were analyzed using a dedicated software (Sonotumor, Bracco Swiss) to quantify the signal of the contrast agent. Time/intensity curves were obtained and the difference of the mean MBs signal before and after high MI impulse (Differential Targeted Enhancement-dTE) was calculated. dTE represents a numeric value in arbitrary units proportional to the amount of bound MBs. At day +14 mice were euthanized and the tumors analyzed for VEGFR-2, pERK, and CD31 tissue levels using western blot analysis. Results: dTE values decreased from day 0 to day +14 both in treatment and vehicle groups, and they were statistically higher in vehicle group than in treatment group at day +2, at day +7, and at day +14. With respect to the degree of tumor volume increase, measured as growth percentage delta (GPD), treatment group was divided in two sub-groups, non-responders (GPD>350%), and responders (GPD<200%). In the same way vehicle group was divided in slow growth group (GPD<400%), and fast growth group (GPD>900%). dTE values at day 0 (immediately before treatment start) were higher in non-responders than in responders group, with statistical difference at day 2. While dTE values were higher in the fast growth group than in the slow growth group only at day 0. A significant positive correlation was found between VEGFR-2 tissue levels and dTE values, confirming that level of BR55 tissue enhancement reflects the amount of tissue VEGF receptor. Conclusions: the present findings show that, at least in murine experimental models, CEUS with BR55 is feasable and appears to be a useful tool in the prediction of tumor growth and response to sorafenib treatment in xenograft HCC.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

One of the main targets of the CMS experiment is to search for the Standard Model Higgs boson. The 4-lepton channel (from the Higgs decay h->ZZ->4l, l = e,mu) is one of the most promising. The analysis is based on the identification of two opposite-sign, same-flavor lepton pairs: leptons are required to be isolated and to come from the same primary vertex. The Higgs would be statistically revealed by the presence of a resonance peak in the 4-lepton invariant mass distribution. The 4-lepton analysis at CMS is presented, spanning on its most important aspects: lepton identification, variables of isolation, impact parameter, kinematics, event selection, background control and statistical analysis of results. The search leads to an evidence for a signal presence with a statistical significance of more than four standard deviations. The excess of data, with respect to the background-only predictions, indicates the presence of a new boson, with a mass of about 126 GeV/c2 , decaying to two Z bosons, whose characteristics are compatible with the SM Higgs ones.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis, we extend some ideas of statistical physics to describe the properties of human mobility. By using a database containing GPS measures of individual paths (position, velocity and covered space at a spatial scale of 2 Km or a time scale of 30 sec), which includes the 2% of the private vehicles in Italy, we succeed in determining some statistical empirical laws pointing out "universal" characteristics of human mobility. Developing simple stochastic models suggesting possible explanations of the empirical observations, we are able to indicate what are the key quantities and cognitive features that are ruling individuals' mobility. To understand the features of individual dynamics, we have studied different aspects of urban mobility from a physical point of view. We discuss the implications of the Benford's law emerging from the distribution of times elapsed between successive trips. We observe how the daily travel-time budget is related with many aspects of the urban environment, and describe how the daily mobility budget is then spent. We link the scaling properties of individual mobility networks to the inhomogeneous average durations of the activities that are performed, and those of the networks describing people's common use of space with the fractional dimension of the urban territory. We study entropy measures of individual mobility patterns, showing that they carry almost the same information of the related mobility networks, but are also influenced by a hierarchy among the activities performed. We discover that Wardrop's principles are violated as drivers have only incomplete information on traffic state and therefore rely on knowledge on the average travel-times. We propose an assimilation model to solve the intrinsic scattering of GPS data on the street network, permitting the real-time reconstruction of traffic state at a urban scale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The evaluation of the farmers’ communities’ approach to the Slow Food vision, their perception of the Slow Food role in supporting their activity and their appreciation and expectations from participating in the event of Mother Earth were studied. The Unified Theory of Acceptance and Use of Technology (UTAUT) model was adopted in an agro-food sector context. A survey was conducted, 120 questionnaires from farmers attending the Mother Earth in Turin in 2010 were collected. The descriptive statistical analysis showed that both Slow Food membership and participation to Mother Earth Meeting were much appreciated for the support provided to their business and the contribution to a more sustainable and fair development. A positive social, environmental and psychological impact on farmers also resulted. Results showed also an interesting perspective on the possible universality of the Slow Food and Mother Earth values. Farmers declared that Slow Food is supporting them by preserving the biodiversity and orienting them to the use of local resources and reducing the chemical inputs. Many farmers mentioned the language/culture and administration/bureaucratic issues as an obstacle to be a member in the movement and to participate to the event. Participation to Mother Earth gives an opportunity to exchange information with other farmers’ communities and to participate to seminars and debates, helpful for their business development. The absolute majority of positive answers associated to the farmers’ willingness to relate to Slow Food and participate to the next Mother Earth editions negatively influenced the UTAUT model results. A factor analysis showed that the variables associated to the UTAUT model constructs Performance Expectancy and Effort Expectancy were consistent, able to explain the construct variability, and their measurement reliable. Their inclusion in a simplest Technology Acceptance Model could be considered in future researches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The uncertainties in the determination of the stratigraphic profile of natural soils is one of the main problems in geotechnics, in particular for landslide characterization and modeling. The study deals with a new approach in geotechnical modeling which relays on a stochastic generation of different soil layers distributions, following a boolean logic – the method has been thus called BoSG (Boolean Stochastic Generation). In this way, it is possible to randomize the presence of a specific material interdigitated in a uniform matrix. In the building of a geotechnical model it is generally common to discard some stratigraphic data in order to simplify the model itself, assuming that the significance of the results of the modeling procedure would not be affected. With the proposed technique it is possible to quantify the error associated with this simplification. Moreover, it could be used to determine the most significant zones where eventual further investigations and surveys would be more effective to build the geotechnical model of the slope. The commercial software FLAC was used for the 2D and 3D geotechnical model. The distribution of the materials was randomized through a specifically coded MatLab program that automatically generates text files, each of them representing a specific soil configuration. Besides, a routine was designed to automate the computation of FLAC with the different data files in order to maximize the sample number. The methodology is applied with reference to a simplified slope in 2D, a simplified slope in 3D and an actual landslide, namely the Mortisa mudslide (Cortina d’Ampezzo, BL, Italy). However, it could be extended to numerous different cases, especially for hydrogeological analysis and landslide stability assessment, in different geological and geomorphological contexts.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The subject of this work concerns the study of the immigration phenomenon, with emphasis on the aspects related to the integration of an immigrant population in a hosting one. Aim of this work is to show the forecasting ability of a recent finding where the behavior of integration quantifiers was analyzed and investigated with a mathematical model of statistical physics origins (a generalization of the monomer dimer model). After providing a detailed literature review of the model, we show that not only such a model is able to identify the social mechanism that drives a particular integration process, but it also provides correct forecast. The research reported here proves that the proposed model of integration and its forecast framework are simple and effective tools to reduce uncertainties about how integration phenomena emerge and how they are likely to develop in response to increased migration levels in the future.