8 resultados para Prediction model

em Universidade do Minho


Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper aims at developing a collision prediction model for three-leg junctions located in national roads (NR) in Northern Portugal. The focus is to identify factors that contribute for collision type crashes in those locations, mainly factors related to road geometric consistency, since literature is scarce on those, and to research the impact of three modeling methods: generalized estimating equations, random-effects negative binomial models and random-parameters negative binomial models, on the factors of those models. The database used included data published between 2008 and 2010 of 177 three-leg junctions. It was split in three groups of contributing factors which were tested sequentially for each of the adopted models: at first only traffic, then, traffic and the geometric characteristics of the junctions within their area of influence; and, lastly, factors which show the difference between the geometric characteristics of the segments boarding the junctionsâ area of influence and the segment included in that area were added. The choice of the best modeling technique was supported by the result of a cross validation made to ascertain the best model for the three sets of researched contributing factors. The models fitted with random-parameters negative binomial models had the best performance in the process. In the best models obtained for every modeling technique, the characteristics of the road environment, including proxy measures for the geometric consistency, along with traffic volume, contribute significantly to the number of collisions. Both the variables concerning junctions and the various national highway segments in their area of influence, as well as variations from those characteristics concerning roadway segments which border the already mentioned area of influence have proven their relevance and, therefore, there is a rightful need to incorporate the effect of geometric consistency in the three-leg junctions safety studies.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Programa Doutoral em Matemática e Aplicações.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Type 2 diabetes (T2D) has been suggested to be a risk factor for multiple myeloma (MM), but the relationship between the two traits is still not well understood. The aims of this study were to evaluate whether 58 genome-wide-association-studies (GWAS)-identified common variants for T2D influence the risk of developing MM and to determine whether predictive models built with these variants might help to predict the disease risk. We conducted a case–control study including 1420 MM patients and 1858 controls ascertained through the International Multiple Myeloma (IMMEnSE) consortium. Subjects carrying the KCNQ1rs2237892T allele or the CDKN2A-2Brs2383208G/G, IGF1rs35767T/T and MADDrs7944584T/T genotypes had a significantly increased risk of MM (odds ratio (OR)=1.32–2.13) whereas those carrying the KCNJ11rs5215C, KCNJ11rs5219T and THADArs7578597C alleles or the FTOrs8050136A/A and LTArs1041981C/C genotypes showed a significantly decreased risk of developing the disease (OR=0.76–0.85). Interestingly, a prediction model including those T2D-related variants associated with the risk of MM showed a significantly improved discriminatory ability to predict the disease when compared to a model without genetic information (area under the curve (AUC)=0.645 vs AUC=0.629; P=4.05×10-06). A gender-stratified analysis also revealed a significant gender effect modification for ADAM30rs2641348 and NOTCH2rs10923931 variants (Pinteraction=0.001 and 0.0004, respectively). Men carrying the ADAM30rs2641348C and NOTCH2rs10923931T alleles had a significantly decreased risk of MM whereas an opposite but not significant effect was observed in women (ORM=0.71 and ORM=0.66 vs ORW=1.22 and ORW=1.15, respectively). These results suggest that TD2-related variants may influence the risk of developing MM and their genotyping might help to improve MM risk prediction models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Customer lifetime value (LTV) enables using client characteristics, such as recency, frequency and monetary (RFM) value, to describe the value of a client through time in terms of profitability. We present the concept of LTV applied to telemarketing for improving the return-on-investment, using a recent (from 2008 to 2013) and real case study of bank campaigns to sell long- term deposits. The goal was to benefit from past contacts history to extract additional knowledge. A total of twelve LTV input variables were tested, un- der a forward selection method and using a realistic rolling windows scheme, highlighting the validity of five new LTV features. The results achieved by our LTV data-driven approach using neural networks allowed an improvement up to 4 pp in the Lift cumulative curve for targeting the deposit subscribers when compared with a baseline model (with no history data). Explanatory knowledge was also extracted from the proposed model, revealing two highly relevant LTV features, the last result of the previous campaign to sell the same product and the frequency of past client successes. The obtained results are particularly valuable for contact center companies, which can improve pre- dictive performance without even having to ask for more information to the companies they serve.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The identification of new and druggable targets in bacteria is a critical endeavour in pharmaceutical research of novel antibiotics to fight infectious agents. The rapid emergence of resistant bacteria makes today's antibiotics more and more ineffective, consequently increasing the need for new pharmacological targets and novel classes of antibacterial drugs. A new model that combines the singular value decomposition technique with biological filters comprised of a set of protein properties associated with bacterial drug targets and similarity to protein-coding essential genes of E. coli has been developed to predict potential drug targets in the Enterobacteriaceae family [1]. This model identified 99 potential target proteins amongst the studied bacterial family, exhibiting eight different functions that suggest that the disruption of the activities of these proteins is critical for cells. Out of these candidates, one was selected for target confirmation. To find target modulators, receptor-based pharmacophore hypotheses were built and used in the screening of a virtual library of compounds. Postscreening filters were based on physicochemical and topological similarity to known Gram-negative antibiotics and applied to the retrieved compounds. Screening hits passing all filters were docked into the proteins catalytic groove and 15 of the most promising compounds were purchased from their chemical vendors to be experimentally tested in vitro. To the best of our knowledge, this is the first attempt to rationalize the search of compounds to probe the relevance of this candidate as a new pharmacological target.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Currently, the quality of the Indonesian national road network is inadequate due to several constraints, including overcapacity and overloaded trucks. The high deterioration rate of the road infrastructure in developing countries along with major budgetary restrictions and high growth in traffic have led to an emerging need for improving the performance of the highway maintenance system. However, the high number of intervening factors and their complex effects require advanced tools to successfully solve this problem. The high learning capabilities of Data Mining (DM) are a powerful solution to this problem. In the past, these tools have been successfully applied to solve complex and multi-dimensional problems in various scientific fields. Therefore, it is expected that DM can be used to analyze the large amount of data regarding the pavement and traffic, identify the relationship between variables, and provide information regarding the prediction of the data. In this paper, we present a new approach to predict the International Roughness Index (IRI) of pavement based on DM techniques. DM was used to analyze the initial IRI data, including age, Equivalent Single Axle Load (ESAL), crack, potholes, rutting, and long cracks. This model was developed and verified using data from an Integrated Indonesia Road Management System (IIRMS) that was measured with the National Association of Australian State Road Authorities (NAASRA) roughness meter. The results of the proposed approach are compared with the IIRMS analytical model adapted to the IRI, and the advantages of the new approach are highlighted. We show that the novel data-driven model is able to learn (with high accuracy) the complex relationships between the IRI and the contributing factors of overloaded trucks

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The use of genome-scale metabolic models has been rapidly increasing in fields such as metabolic engineering. An important part of a metabolic model is the biomass equation since this reaction will ultimately determine the predictive capacity of the model in terms of essentiality and flux distributions. Thus, in order to obtain a reliable metabolic model the biomass precursors and their coefficients must be as precise as possible. Ideally, determination of the biomass composition would be performed experimentally, but when no experimental data are available this is established by approximation to closely related organisms. Computational methods however, can extract some information from the genome such as amino acid and nucleotide compositions. The main objectives of this study were to compare the biomass composition of several organisms and to evaluate how biomass precursor coefficients affected the predictability of several genome-scale metabolic models by comparing predictions with experimental data in literature. For that, the biomass macromolecular composition was experimentally determined and the amino acid composition was both experimentally and computationally estimated for several organisms. Sensitivity analysis studies were also performed with the Escherichia coli iAF1260 metabolic model concerning specific growth rates and flux distributions. The results obtained suggest that the macromolecular composition is conserved among related organisms. Contrasting, experimental data for amino acid composition seem to have no similarities for related organisms. It was also observed that the impact of macromolecular composition on specific growth rates and flux distributions is larger than the impact of amino acid composition, even when data from closely related organisms are used.