935 resultados para test data generation
Resumo:
BACKGROUND: Complete investigation of thrombophilic or hemorrhagic clinical presentations is a time-, apparatus-, and cost-intensive process. Sensitive screening tests for characterizing the overall function of the hemostatic system, or defined parts of it, would be very useful. For this purpose, we are developing an electrochemical biosensor system that allows measurement of thrombin generation in whole blood as well as in plasma. METHODS: The measuring system consists of a single-use electrochemical sensor in the shape of a strip and a measuring unit connected to a personal computer, recording the electrical signal. Blood is added to a specific reagent mixture immobilized in dry form on the strip, including a coagulation activator (e.g., tissue factor or silica) and an electrogenic substrate specific to thrombin. RESULTS: Increasing thrombin concentrations gave standard curves with progressively increasing maximal current and decreasing time to reach the peak. Because the measurement was unaffected by color or turbidity, any type of blood sample could be analyzed: platelet-poor plasma, platelet-rich plasma, and whole blood. The test strips with the predried reagents were stable when stored for several months before testing. Analysis of the combined results obtained with different activators allowed discrimination between defects of the extrinsic, intrinsic, and common coagulation pathways. Activated protein C (APC) predried on the strips allowed identification of APC-resistance in plasma and whole blood samples. CONCLUSIONS: The biosensor system provides a new method for assessing thrombin generation in plasma or whole blood samples as small as 10 microL. The assay is easy to use, thus allowing it to be performed in a point-of-care setting.
Resumo:
A problem frequently encountered in Data Envelopment Analysis (DEA) is that the total number of inputs and outputs included tend to be too many relative to the sample size. One way to counter this problem is to combine several inputs (or outputs) into (meaningful) aggregate variables reducing thereby the dimension of the input (or output) vector. A direct effect of input aggregation is to reduce the number of constraints. This, in its turn, alters the optimal value of the objective function. In this paper, we show how a statistical test proposed by Banker (1993) may be applied to test the validity of a specific way of aggregating several inputs. An empirical application using data from Indian manufacturing for the year 2002-03 is included as an example of the proposed test.
Resumo:
Next-generation DNA sequencing platforms can effectively detect the entire spectrum of genomic variation and is emerging to be a major tool for systematic exploration of the universe of variants and interactions in the entire genome. However, the data produced by next-generation sequencing technologies will suffer from three basic problems: sequence errors, assembly errors, and missing data. Current statistical methods for genetic analysis are well suited for detecting the association of common variants, but are less suitable to rare variants. This raises great challenge for sequence-based genetic studies of complex diseases.^ This research dissertation utilized genome continuum model as a general principle, and stochastic calculus and functional data analysis as tools for developing novel and powerful statistical methods for next generation of association studies of both qualitative and quantitative traits in the context of sequencing data, which finally lead to shifting the paradigm of association analysis from the current locus-by-locus analysis to collectively analyzing genome regions.^ In this project, the functional principal component (FPC) methods coupled with high-dimensional data reduction techniques will be used to develop novel and powerful methods for testing the associations of the entire spectrum of genetic variation within a segment of genome or a gene regardless of whether the variants are common or rare.^ The classical quantitative genetics suffer from high type I error rates and low power for rare variants. To overcome these limitations for resequencing data, this project used functional linear models with scalar response to develop statistics for identifying quantitative trait loci (QTLs) for both common and rare variants. To illustrate their applications, the functional linear models were applied to five quantitative traits in Framingham heart studies. ^ This project proposed a novel concept of gene-gene co-association in which a gene or a genomic region is taken as a unit of association analysis and used stochastic calculus to develop a unified framework for testing the association of multiple genes or genomic regions for both common and rare alleles. The proposed methods were applied to gene-gene co-association analysis of psoriasis in two independent GWAS datasets which led to discovery of networks significantly associated with psoriasis.^
Resumo:
An interim analysis is usually applied in later phase II or phase III trials to find convincing evidence of a significant treatment difference that may lead to trial termination at an earlier point than planned at the beginning. This can result in the saving of patient resources and shortening of drug development and approval time. In addition, ethics and economics are also the reasons to stop a trial earlier. In clinical trials of eyes, ears, knees, arms, kidneys, lungs, and other clustered treatments, data may include distribution-free random variables with matched and unmatched subjects in one study. It is important to properly include both subjects in the interim and the final analyses so that the maximum efficiency of statistical and clinical inferences can be obtained at different stages of the trials. So far, no publication has applied a statistical method for distribution-free data with matched and unmatched subjects in the interim analysis of clinical trials. In this simulation study, the hybrid statistic was used to estimate the empirical powers and the empirical type I errors among the simulated datasets with different sample sizes, different effect sizes, different correlation coefficients for matched pairs, and different data distributions, respectively, in the interim and final analysis with 4 different group sequential methods. Empirical powers and empirical type I errors were also compared to those estimated by using the meta-analysis t-test among the same simulated datasets. Results from this simulation study show that, compared to the meta-analysis t-test commonly used for data with normally distributed observations, the hybrid statistic has a greater power for data observed from normally, log-normally, and multinomially distributed random variables with matched and unmatched subjects and with outliers. Powers rose with the increase in sample size, effect size, and correlation coefficient for the matched pairs. In addition, lower type I errors were observed estimated by using the hybrid statistic, which indicates that this test is also conservative for data with outliers in the interim analysis of clinical trials.^
Resumo:
Next-generation sequencing (NGS) technology has become a prominent tool in biological and biomedical research. However, NGS data analysis, such as de novo assembly, mapping and variants detection is far from maturity, and the high sequencing error-rate is one of the major problems. . To minimize the impact of sequencing errors, we developed a highly robust and efficient method, MTM, to correct the errors in NGS reads. We demonstrated the effectiveness of MTM on both single-cell data with highly non-uniform coverage and normal data with uniformly high coverage, reflecting that MTM’s performance does not rely on the coverage of the sequencing reads. MTM was also compared with Hammer and Quake, the best methods for correcting non-uniform and uniform data respectively. For non-uniform data, MTM outperformed both Hammer and Quake. For uniform data, MTM showed better performance than Quake and comparable results to Hammer. By making better error correction with MTM, the quality of downstream analysis, such as mapping and SNP detection, was improved. SNP calling is a major application of NGS technologies. However, the existence of sequencing errors complicates this process, especially for the low coverage (
Resumo:
A small Positron Emission Tomography demonstrator based on LYSO slabs and Silicon Photomultiplier matrices is under construction at the University and INFN of Pisa. In this paper we present the characterization results of the read-out electronics and of the detection system. Two SiPM matrices, composed by 8 × 8 SiPM pixels, 1.5 mm pitch, have been coupled one to one to a LYSO crystals array. Custom Front-End ASICs were used to read the 64 channels of each matrix. Data from each Front-End were multiplexed and sent to a DAQ board for the digital conversion; a motherboard collects the data and communicates with a host computer through a USB port. Specific tests were carried out on the system in order to assess its performance. Futhermore we have measured some of the most important parameters of the system for PET application.
Resumo:
Current trends in the European Higher Education Area (EHEA) are moving towards the continuous evaluation of the students in substitution of the traditional evaluation based on a single test or exam. This fact and the increase in the number of students during last years in Engineering Schools, requires to modify evaluation procedures making them compatible with the educational and research activities. This work presents a methodology for the automatic generation of questions. These questions can be used as self assessment questions by the student and/or as queries by the teacher. The proposed approach is based on the utilization of parametric questions, formulated as multiple choice questions and generated and supported by the utilization of common programs of data sheets and word processors. Through this approach, every teacher can apply the proposed methodology without the use of programs or tools different from those normally used in his/her daily activity
Resumo:
El objetivo de este proyecto es diseñar un sistema capaz de controlar la velocidad de rotación de un motor DC en función del valor de temperatura obtenido de un sensor. Para ello se generará con un microcontrolador una señal PWM, cuyo ciclo de trabajo estará en función de la temperatura medida. En lo que respecta a la fase de diseño, hay dos partes claramente diferenciadas, relativas al hardware y al software. En cuanto al diseño del hardware puede hacerse a su vez una división en dos partes. En primer lugar, hubo que diseñar la circuitería necesaria para adaptar los niveles de tensión entregados por el sensor de temperatura a los niveles requeridos por ADC, requerido para digitalizar la información para su posterior procesamiento por parte del microcontrolador. Por tanto hubo que diseñar capaz de corregir el offset y la pendiente de la función tensión-temperatura del sensor, a fin de adaptarlo al rango de tensión requerido por el ADC. Por otro lado, hubo que diseñar el circuito encargado de controlar la velocidad de rotación del motor. Este circuito estará basado en un transistor MOSFET en conmutación, controlado mediante una señal PWM como se mencionó anteriormente. De esta manera, al variar el ciclo de trabajo de la señal PWM, variará de manera proporcional la tensión que cae en el motor, y por tanto su velocidad de rotación. En cuanto al diseño del software, se programó el microcontrolador para que generase una señal PWM en uno de sus pines en función del valor entregado por el ADC, a cuya entrada está conectada la tensión obtenida del circuito creado para adaptar la tensión generada por el sensor. Así mismo, se utiliza el microcontrolador para representar el valor de temperatura obtenido en una pantalla LCD. Para este proyecto se eligió una placa de desarrollo mbed, que incluye el microcontrolador integrado, debido a que facilita la tarea del prototipado. Posteriormente se procedió a la integración de ambas partes, y testeado del sistema para comprobar su correcto funcionamiento. Puesto que el resultado depende de la temperatura medida, fue necesario simular variaciones en ésta, para así comprobar los resultados obtenidos a distintas temperaturas. Para este propósito se empleó una bomba de aire caliente. Una vez comprobado el funcionamiento, como último paso se diseñó la placa de circuito impreso. Como conclusión, se consiguió desarrollar un sistema con un nivel de exactitud y precisión aceptable, en base a las limitaciones del sistema. SUMMARY: It is obvious that day by day people’s daily life depends more on technology and science. Tasks tend to be done automatically, making them simpler and as a result, user life is more comfortable. Every single task that can be controlled has an electronic system behind. In this project, a control system based on a microcontroller was designed for a fan, allowing it to go faster when temperature rises or slowing down as the environment gets colder. For this purpose, a microcontroller was programmed to generate a signal, to control the rotation speed of the fan depending on the data acquired from a temperature sensor. After testing the whole design developed in the laboratory, the next step taken was to build a prototype, which allows future improvements in the system that are discussed in the corresponding section of the thesis.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Generation of Fission Yield covariance data and application to Fission Pulse Decay Heat calculations
Resumo:
Generation of Fission Yield covariance data and application to Fission Pulse Decay Heat calculations
Resumo:
La actuación de cargas sobre un suelo saturado, produce, en función de la naturaleza de la solicitación, del tipo de suelo y de las condiciones de drenaje del terreno, un incremento de la presión sobre el agua de los intersticios. Se ha abordado el estudio de la generación de dicha presión intersticial a partir de muestras de suelo blando, normalmente consolidadas, procedentes del subsuelo del Puerto de Barcelona. Para ello se han utilizado los datos de una completa campaña experimental utilizando la máquina de ensayo de corte simple, y tras la adecuada interpretación del ensayo, se identifican los aspectos que se consideran clave para el proceso de generación de presión intersticial en el suelo, según las diferentes situaciones de carga actuante. Como conclusión, se plantea la generalización de la clásica ecuación que Skempton formuló hace casi 60 años y que permite interpretar la generación de presión intersticial para el aparato de ensayo triaxial. ncrease in pore pressure depending on the nature of the excitation, the type of soil and the drainage conditions of the ground are generated by loads acting on a saturated soil. It has approached the study of the generation of the pore pressure from usually consolidated soft soil samples from the basement of the Port of Barcelona. Data from an adequate experimental campaign using simple shear machine have been used, and the aspects that are considered key to the process of generation of pore pressure in the soil under different loading conditions acting have been identified. In conclusion, the generalization of Skempton's classical equation (formulated almost 60 years ago) and the interpretation of the pore pressure generation for triaxial test apparatus have been proposed.