965 resultados para interval-censored data


Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In an increasing number of applications (e.g., in embedded, real-time, or mobile systems) it is important or even essential to ensure conformance with respect to a specification expressing resource usages, such as execution time, memory, energy, or user-defined resources. In previous work we have presented a novel framework for data size-aware, static resource usage verification. Specifications can include both lower and upper bound resource usage functions. In order to statically check such specifications, both upper- and lower-bound resource usage functions (on input data sizes) approximating the actual resource usage of the program which are automatically inferred and compared against the specification. The outcome of the static checking of assertions can express intervals for the input data sizes such that a given specification can be proved for some intervals but disproved for others. After an overview of the approach in this paper we provide a number of novel contributions: we present a full formalization, and we report on and provide results from an implementation within the Ciao/CiaoPP framework (which provides a general, unified platform for static and run-time verification, as well as unit testing). We also generalize the checking of assertions to allow preconditions expressing intervals within which the input data size of a program is supposed to lie (i.e., intervals for which each assertion is applicable), and we extend the class of resource usage functions that can be checked.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Along the Apulian Adriatic coast, in a cliff south of Trani, a succession of three units (superimposed on one another) of marine and/or paralic environments has been recognised. The lowest unit I is characterised by calcareous/siliciclastic sands (css), micritic limestones (ml), stromatolitic and characean boundstones (scb), characean calcarenites (cc). The sedimentary environment merges from shallow marine, with low energy and temporary episodes of subaerial exposure, to lagoonal with a few exchanges with the sea. The lagoonal stromatolites (scb subunit) grew during a long period of relative stability of a high sea level in tropical climate. The unit I is truncated at the top by an erosion surface on which the unit II overlies; this consists of a basal pebble lag (bpl), silicicla - stic sands (ss), calcareous sands (cs), characean boundstones (cb), brown paleosol (bp). The sedimentary environment varies from beach to lagoon with salinity variations. Although there are indications of seismic events within the subunits cs, unit II deposition took place in a context of relative stability. The unit II is referable to a sea level highstand. Unit III, trangressive on the preceding, consists of white calcareous sands (wcs), calcareous sands and calcarenites (csc), phytoclastic calcirudite and phytohermal travertine (pcpt), mixed deposits (csl, m, k, c), sands (s) and red/brown paleosols (rbp). The sedimentation of this unit was affected by synsedimentary tectonic, attested by seismites found at several heights. Also the unit III is referable to a sea level highstand. The scientific literature has so far generally attributed to the Tyrrhenian (auct.) the deposits of Trani cliff. As part of this work some datings were performed on 10 samples, using the amino acid racemization method (AAR) applied to ostracod carapaces. Four of these samples have been rejected because they have shown in laboratory recent contamination. The numerical ages indicate that the deposits of the Trani cliff are older than MIS 5. The upper part of the unit I has been dated to 355±85 ka BP, thus allowing to assign the lowest stromatolitic subunit (scb) at the MIS 11 peak and the top of the unit I at the MIS 11-MIS 10 interval. The base of the unit II has been dated to 333±118 ka BP, thus attributing the erosion surface that bounds the units I and II to the MIS 10 lowstand and the lower part of the unit II to MIS 9.3. The upper part of the unit II has been dated to 234±35 ka BP, while three other numerical ages come from unit III: 303±35, 267±51, 247±61 ka BP. At present, the numerical ages cannot distinguish the sedimentation ages of units II and III, which are both related to the MIS 9.3- MIS 7.1 time range. However, the position of the units, superimposed one another, and their respective age, allows us to recognise a subsidence phase between MIS 11 and MIS 7, followed by an uplift phase between the MIS 7 and the present day, which led the deposits in their current position. This tectonic pattern is not in full agreement with what is described in the literature for the Apulian foreland.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

We propose a general procedure for solving incomplete data estimation problems. The procedure can be used to find the maximum likelihood estimate or to solve estimating equations in difficult cases such as estimation with the censored or truncated regression model, the nonlinear structural measurement error model, and the random effects model. The procedure is based on the general principle of stochastic approximation and the Markov chain Monte-Carlo method. Applying the theory on adaptive algorithms, we derive conditions under which the proposed procedure converges. Simulation studies also indicate that the proposed procedure consistently converges to the maximum likelihood estimate for the structural measurement error logistic regression model.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Niemann–Pick disease type C (NP-C) is an autosomal recessive lipidosis linked to chromosome 18q11–12, characterized by lysosomal accumulation of unesterified cholesterol and delayed induction of cholesterol-mediated homeostatic responses. This cellular phenotype is identifiable cytologically by filipin staining and biochemically by measurement of low-density lipoprotein-derived cholesterol esterification. The mutant Chinese hamster ovary cell line (CT60), which displays the NP-C cellular phenotype, was used as the recipient for a complementation assay after somatic cell fusions with normal and NP-C murine cells suggested that this Chinese hamster ovary cell line carries an alteration(s) in the hamster homolog(s) of NP-C. To narrow rapidly the candidate interval for NP-C, three overlapping yeast artificial chromosomes (YACs) spanning the 1 centimorgan human NP-C interval were introduced stably into CT60 cells and analyzed for correction of the cellular phenotype. Only YAC 911D5 complemented the NP-C phenotype, as evidenced by cytological and biochemical analyses, whereas no complementation was obtained from the other two YACs within the interval or from a YAC derived from chromosome 7. Fluorescent in situ hybridization indicated that YAC 911D5 was integrated at a single site per CT60 genome. These data substantially narrow the NP-C critical interval and should greatly simplify the identification of the gene responsible in mouse and man. This is the first demonstration of YAC complementation as a valuable adjunct strategy for positional cloning of a human gene.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Proportion correct in two-alternative forcedchoice (2AFC) detection tasks often varies when the stimulus is presented in the first or in the second interval.Reanalysis of published data reveals that these order effects (or interval bias) are strong and prevalent, refuting the standard difference model of signal detection theory. Order effects are commonly regarded as evidence that observers use an off-center criterion under the difference model with bias. We consider an alternative difference model with indecision whereby observers are occasionally undecided and guess with some bias toward one of the response options. Whether or not the data show order effects, the two models fit 2AFC data indistinguishably, but they yield meaningfully different estimates of sensory parameters. Under indeterminacy as to which model governs 2AFC performance, parameter estimates are suspect and potentially misleading. The indeterminacy can be circumvented by modifying the response format so that observers can express indecision when needed. Reanalysis of published data collected in this way lends support to the indecision model. We illustrate alternative approaches to fitting psychometric functions under the indecision model and discuss designs for 2AFC experiments that improve the accuracy of parameter estimates, whether or not order effects are apparent in the data.