903 resultados para Data-driven Methods


Relevância:

40.00% 40.00%

Publicador:

Resumo:

As a thorough aggregation of probability and graph theory, Bayesian networks currently enjoy widespread interest as a means for studying factors that affect the coherent evaluation of scientific evidence in forensic science. Paper I of this series of papers intends to contribute to the discussion of Bayesian networks as a framework that is helpful for both illustrating and implementing statistical procedures that are commonly employed for the study of uncertainties (e.g. the estimation of unknown quantities). While the respective statistical procedures are widely described in literature, the primary aim of this paper is to offer an essentially non-technical introduction on how interested readers may use these analytical approaches - with the help of Bayesian networks - for processing their own forensic science data. Attention is mainly drawn to the structure and underlying rationale of a series of basic and context-independent network fragments that users may incorporate as building blocs while constructing larger inference models. As an example of how this may be done, the proposed concepts will be used in a second paper (Part II) for specifying graphical probability networks whose purpose is to assist forensic scientists in the evaluation of scientific evidence encountered in the context of forensic document examination (i.e. results of the analysis of black toners present on printed or copied documents).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computationally efficient than training a single SVR model while reducing error. Boosting, however, does not improve results on this specific problem.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This project develops a smartphone-based prototype system that supplements the 511 system to improve its dynamic traffic routing service to state highway users under non-recurrent congestion. This system will save considerable time to provide crucial traffic information and en-route assistance to travelers for them to avoid being trapped in traffic congestion due to accidents, work zones, hazards, or special events. It also creates a feedback loop between travelers and responsible agencies that enable the state to effectively collect, fuse, and analyze crowd-sourced data for next-gen transportation planning and management. This project can result in substantial economic savings (e.g. less traffic congestion, reduced fuel wastage and emissions) and safety benefits for the freight industry and society due to better dissemination of real-time traffic information by highway users. Such benefits will increase significantly in future with the expected increase in freight traffic on the network. The proposed system also has the flexibility to be integrated with various transportation management modules to assist state agencies to improve transportation services and daily operations.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A statewide study was performed to develop regional regression equations for estimating selected annual exceedance- probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedanceprobability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized leastsquares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized leastsquares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper, some steganalytic techniques designed to detect the existence of hidden messages using histogram shifting methods are presented. Firstly, some techniques to identify specific methods of histogram shifting, based on visible marks on the histogram or abnormal statistical distributions are suggested. Then, we present a general technique capable of detecting all histogram shifting techniques analyzed. This technique is based on the effect of histogram shifting methods on the "volatility" of the histogram of differences and the study of its reduction whenever new data are hidden.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In anticipation of regulation involving numeric turbidity limit at highway construction sites, research was done into the most appropriate, affordable methods for surface water monitoring. Measuring sediment concentration in streams may be conducted a number of ways. As part of a project funded by the Iowa Department of Transportation, several testing methods were explored to determine the most affordable, appropriate methods for data collection both in the field and in the lab. The primary purpose of the research was to determine the exchangeability of the acrylic transparency tube for water clarity analysis as compared to the turbidimeter.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This work is devoted to the problem of reconstructing the basis weight structure at paper web with black{box techniques. The data that is analyzed comes from a real paper machine and is collected by an o®-line scanner. The principal mathematical tool used in this work is Autoregressive Moving Average (ARMA) modelling. When coupled with the Discrete Fourier Transform (DFT), it gives a very flexible and interesting tool for analyzing properties of the paper web. Both ARMA and DFT are independently used to represent the given signal in a simplified version of our algorithm, but the final goal is to combine the two together. Ljung-Box Q-statistic lack-of-fit test combined with the Root Mean Squared Error coefficient gives a tool to separate significant signals from noise.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

OBJECTIVE: Routinely collected health data, collected for administrative and clinical purposes, without specific a priori research questions, are increasingly used for observational, comparative effectiveness, health services research, and clinical trials. The rapid evolution and availability of routinely collected data for research has brought to light specific issues not addressed by existing reporting guidelines. The aim of the present project was to determine the priorities of stakeholders in order to guide the development of the REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. METHODS: Two modified electronic Delphi surveys were sent to stakeholders. The first determined themes deemed important to include in the RECORD statement, and was analyzed using qualitative methods. The second determined quantitative prioritization of the themes based on categorization of manuscript headings. The surveys were followed by a meeting of RECORD working committee, and re-engagement with stakeholders via an online commentary period. RESULTS: The qualitative survey (76 responses of 123 surveys sent) generated 10 overarching themes and 13 themes derived from existing STROBE categories. Highest-rated overall items for inclusion were: Disease/exposure identification algorithms; Characteristics of the population included in databases; and Characteristics of the data. In the quantitative survey (71 responses of 135 sent), the importance assigned to each of the compiled themes varied depending on the manuscript section to which they were assigned. Following the working committee meeting, online ranking by stakeholders provided feedback and resulted in revision of the final checklist. CONCLUSIONS: The RECORD statement incorporated the suggestions provided by a large, diverse group of stakeholders to create a reporting checklist specific to observational research using routinely collected health data. Our findings point to unique aspects of studies conducted with routinely collected health data and the perceived need for better reporting of methodological issues.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent advances in machine learning methods enable increasingly the automatic construction of various types of computer assisted methods that have been difficult or laborious to program by human experts. The tasks for which this kind of tools are needed arise in many areas, here especially in the fields of bioinformatics and natural language processing. The machine learning methods may not work satisfactorily if they are not appropriately tailored to the task in question. However, their learning performance can often be improved by taking advantage of deeper insight of the application domain or the learning problem at hand. This thesis considers developing kernel-based learning algorithms incorporating this kind of prior knowledge of the task in question in an advantageous way. Moreover, computationally efficient algorithms for training the learning machines for specific tasks are presented. In the context of kernel-based learning methods, the incorporation of prior knowledge is often done by designing appropriate kernel functions. Another well-known way is to develop cost functions that fit to the task under consideration. For disambiguation tasks in natural language, we develop kernel functions that take account of the positional information and the mutual similarities of words. It is shown that the use of this information significantly improves the disambiguation performance of the learning machine. Further, we design a new cost function that is better suitable for the task of information retrieval and for more general ranking problems than the cost functions designed for regression and classification. We also consider other applications of the kernel-based learning algorithms such as text categorization, and pattern recognition in differential display. We develop computationally efficient algorithms for training the considered learning machines with the proposed kernel functions. We also design a fast cross-validation algorithm for regularized least-squares type of learning algorithm. Further, an efficient version of the regularized least-squares algorithm that can be used together with the new cost function for preference learning and ranking tasks is proposed. In summary, we demonstrate that the incorporation of prior knowledge is possible and beneficial, and novel advanced kernels and cost functions can be used in algorithms efficiently.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The objective of this dissertation is to improve the dynamic simulation of fluid power circuits. A fluid power circuit is a typical way to implement power transmission in mobile working machines, e.g. cranes, excavators etc. Dynamic simulation is an essential tool in developing controllability and energy-efficient solutions for mobile machines. Efficient dynamic simulation is the basic requirement for the real-time simulation. In the real-time simulation of fluid power circuits there exist numerical problems due to the software and methods used for modelling and integration. A simulation model of a fluid power circuit is typically created using differential and algebraic equations. Efficient numerical methods are required since differential equations must be solved in real time. Unfortunately, simulation software packages offer only a limited selection of numerical solvers. Numerical problems cause noise to the results, which in many cases leads the simulation run to fail. Mathematically the fluid power circuit models are stiff systems of ordinary differential equations. Numerical solution of the stiff systems can be improved by two alternative approaches. The first is to develop numerical solvers suitable for solving stiff systems. The second is to decrease the model stiffness itself by introducing models and algorithms that either decrease the highest eigenvalues or neglect them by introducing steady-state solutions of the stiff parts of the models. The thesis proposes novel methods using the latter approach. The study aims to develop practical methods usable in dynamic simulation of fluid power circuits using explicit fixed-step integration algorithms. In this thesis, twomechanisms whichmake the systemstiff are studied. These are the pressure drop approaching zero in the turbulent orifice model and the volume approaching zero in the equation of pressure build-up. These are the critical areas to which alternative methods for modelling and numerical simulation are proposed. Generally, in hydraulic power transmission systems the orifice flow is clearly in the turbulent area. The flow becomes laminar as the pressure drop over the orifice approaches zero only in rare situations. These are e.g. when a valve is closed, or an actuator is driven against an end stopper, or external force makes actuator to switch its direction during operation. This means that in terms of accuracy, the description of laminar flow is not necessary. But, unfortunately, when a purely turbulent description of the orifice is used, numerical problems occur when the pressure drop comes close to zero since the first derivative of flow with respect to the pressure drop approaches infinity when the pressure drop approaches zero. Furthermore, the second derivative becomes discontinuous, which causes numerical noise and an infinitely small integration step when a variable step integrator is used. A numerically efficient model for the orifice flow is proposed using a cubic spline function to describe the flow in the laminar and transition areas. Parameters for the cubic spline function are selected such that its first derivative is equal to the first derivative of the pure turbulent orifice flow model in the boundary condition. In the dynamic simulation of fluid power circuits, a tradeoff exists between accuracy and calculation speed. This investigation is made for the two-regime flow orifice model. Especially inside of many types of valves, as well as between them, there exist very small volumes. The integration of pressures in small fluid volumes causes numerical problems in fluid power circuit simulation. Particularly in realtime simulation, these numerical problems are a great weakness. The system stiffness approaches infinity as the fluid volume approaches zero. If fixed step explicit algorithms for solving ordinary differential equations (ODE) are used, the system stability would easily be lost when integrating pressures in small volumes. To solve the problem caused by small fluid volumes, a pseudo-dynamic solver is proposed. Instead of integration of the pressure in a small volume, the pressure is solved as a steady-state pressure created in a separate cascade loop by numerical integration. The hydraulic capacitance V/Be of the parts of the circuit whose pressures are solved by the pseudo-dynamic method should be orders of magnitude smaller than that of those partswhose pressures are integrated. The key advantage of this novel method is that the numerical problems caused by the small volumes are completely avoided. Also, the method is freely applicable regardless of the integration routine applied. The superiority of both above-mentioned methods is that they are suited for use together with the semi-empirical modelling method which necessarily does not require any geometrical data of the valves and actuators to be modelled. In this modelling method, most of the needed component information can be taken from the manufacturer’s nominal graphs. This thesis introduces the methods and shows several numerical examples to demonstrate how the proposed methods improve the dynamic simulation of various hydraulic circuits.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Multiple Linear Regression (MLR) are some of the mathematical pre- liminaries that are discussed prior to explaining PLS and PCR models. Both PLS and PCR are applied to real spectral data and their di erences and similarities are discussed in this thesis. The challenge lies in establishing the optimum number of components to be included in either of the models but this has been overcome by using various diagnostic tools suggested in this thesis. Correspondence analysis (CA) and PLS were applied to ecological data. The idea of CA was to correlate the macrophytes species and lakes. The di erences between PLS model for ecological data and PLS for spectral data are noted and explained in this thesis. i

Relevância:

40.00% 40.00%

Publicador:

Resumo:

The pumping processes requiring wide range of flow are often equipped with parallelconnected centrifugal pumps. In parallel pumping systems, the use of variable speed control allows that the required output for the process can be delivered with a varying number of operated pump units and selected rotational speed references. However, the optimization of the parallel-connected rotational speed controlled pump units often requires adaptive modelling of both parallel pump characteristics and the surrounding system in varying operation conditions. The available information required for the system modelling in typical parallel pumping applications such as waste water treatment and various cooling and water delivery pumping tasks can be limited, and the lack of real-time operation point monitoring often sets limits for accurate energy efficiency optimization. Hence, alternatives for easily implementable control strategies which can be adopted with minimum system data are necessary. This doctoral thesis concentrates on the methods that allow the energy efficient use of variable speed controlled parallel pumps in system scenarios in which the parallel pump units consist of a centrifugal pump, an electric motor, and a frequency converter. Firstly, the suitable operation conditions for variable speed controlled parallel pumps are studied. Secondly, methods for determining the output of each parallel pump unit using characteristic curve-based operation point estimation with frequency converter are discussed. Thirdly, the implementation of the control strategy based on real-time pump operation point estimation and sub-optimization of each parallel pump unit is studied. The findings of the thesis support the idea that the energy efficiency of the pumping can be increased without the installation of new, more efficient components in the systems by simply adopting suitable control strategies. An easily implementable and adaptive control strategy for variable speed controlled parallel pumping systems can be created by utilizing the pump operation point estimation available in modern frequency converters. Hence, additional real-time flow metering, start-up measurements, and detailed system model are unnecessary, and the pumping task can be fulfilled by determining a speed reference for each parallel-pump unit which suggests the energy efficient operation of the pumping system.