949 resultados para monotone missing data
Resumo:
METHODS Spirometry datasets from South-Asian children were collated from four centres in India and five within the UK. Records with transcription errors, missing values for height or spirometry, and implausible values were excluded(n = 110). RESULTS Following exclusions, cross-sectional data were available from 8,124 children (56.3% male; 5-17 years). When compared with GLI-predicted values from White Europeans, forced expired volume in 1s (FEV1) and forced vital capacity (FVC) in South-Asian children were on average 15% lower, ranging from 4-19% between centres. By contrast, proportional reductions in FEV1 and FVC within all but two datasets meant that the FEV1/FVC ratio remained independent of ethnicity. The 'GLI-Other' equation fitted data from North India reasonably well while 'GLI-Black' equations provided a better approximation for South-Asian data than the 'GLI-White' equation. However, marked discrepancies in the mean lung function z-scores between centres especially when examined according to socio-economic conditions precluded derivation of a single South-Asian GLI-adjustment. CONCLUSION Until improved and more robust prediction equations can be derived, we recommend the use of 'GLI-Black' equations for interpreting most South-Asian data, although 'GLI-Other' may be more appropriate for North Indian data. Prospective data collection using standardised protocols to explore potential sources of variation due to socio-economic circumstances, secular changes in growth/predictors of lung function and ethnicities within the South-Asian classification are urgently required.
Commercial Sexual Exploitation and Missing Children in the Coastal Region of Sao Paulo State, Brazil
Resumo:
The commercial sexual exploitation of children (CSEC) has emerged as one of the world’s most heinous crimes. The problem affects millions of children worldwide and no country or community is fully immune from its effects. This paper reports first generation research of the relationship that exists between CSEC and the phenomenon of missing children living in and around the coastal regions of the state of Sao Paulo, Brazil, the country’s richest State. Data are reported from interviews and case records of 64 children and adolescents, who were receiving care through a major youth serving non-governmental organization (NGO) located in the coastal city of Sao Vicente. Also, data about missing children and adolescents were collected from Police Reports – a total of 858 Police Reports. In Brazil, prostitution is not a crime itself, however, the exploitation of prostitution is a crime. Therefore, the police have no information about children or adolescents in this situation, they only have information about the clients and exploiters. Thus, this investigation sought to accomplish two objectives: 1) to establish the relationship between missing and sexual exploited children; and 2) to sensitize police and child-serving authorities in both the governmental and nongovernmental sectors to the nature, extent, and seriousness of many unrecognized cases of CSEC and missing children that come to their attention. The observed results indicated that the missing children police report are significantly underestimated. They do not represent the number of children that run away and/or are involved in commercial sexual exploitation.
Resumo:
Strontium isotope stratigraphy was used to date 5 discrete horizons within the CRP-3 drillhole. A single in situ modiolid bivalve fragment at 10.88 mbsf gives an age of 30.9 (±0.8) Ma for the associated sediment. The four remaining well preserved fragments recovered from 29.94-190.31 mbsf are within error of this age, indicating a high sedimentation rate and suggesting little time is missing in disconformities. The diagenetic alteration of carbonate macrofossils by continental fluids (and possibly seawater) is a common feature to 320 mbsf.
Resumo:
The first 1400-year floating varve chronology for north-eastern Germany covering the late Allerød to the early Holocene has been established by microscopic varve counts from the Rehwiese palaeolake sediment record. The Laacher See Tephra (LST), at the base of the studied interval, forms the tephrochronological anchor point. The fine laminations were examined using a combination of micro-facies and ?-XRF analyses and are typical of calcite varves, which in this case provide mainly a warm season signal. Two varve types with different sub-layer structures have been distinguished: (I) complex varves consisting of up to four seasonal sub-layers formed during the Allerød and early Holocene periods, and, (II) simple two sub-layer type varves only occurring during the Younger Dryas. The precision of the chronology has been improved by varve-to-varve comparison of two independently analyzed sediment profiles based on well-defined micro-marker layers. This has enabled both (1) the precise location of single missing varves in one of the sediment profiles, and, (2) the verification of varve interpolation in disturbed varve intervals in the parallel core. Inter-annual and decadal-scale variability in sediment deposition processes were traced by multi-proxy data series including seasonal layer thickness, high-resolution element scans and total organic and inorganic carbon data at a five-varve resolution. These data support the idea of a two-phase Younger Dryas, with the first interval (12,675 - 12,275 varve years BP) characterised by a still significant but gradually decreasing warm-season calcite precipitation and a second phase (12,275 - 11,640 varve years BP) with only weak calcite precipitation. Detailed correlation of these two phases with the Meerfelder Maar record based on the LST isochrone and independent varve counts provides clues about regional differences and seasonal aspects of YD climate change along a transect from a location proximal to the North Atlantic in the west to a more continental site in the east.
Resumo:
Detailed data on land use and land cover constitute important information for Earth system models, environmental monitoring and ecosystem services research. Global land cover products are evolving rapidly; however, there is still a lack of information particularly for heterogeneous agricultural landscapes. We censused land use and land cover field by field in the agricultural mosaic catchment Haean in South Korea. We recorded the land cover types with additional information on agricultural practice. In this paper we introduce the data, their collection and the post-processing protocol. Furthermore, because it is important to quantitatively evaluate available land use and land cover products, we compared our data with the MODIS Land Cover Type product (MCD12Q1). During the studied period, a large portion of dry fields was converted to perennial crops. Compared to our data, the forested area was underrepresented and the agricultural area overrepresented in MCD12Q1. In addition, linear landscape elements such as waterbodies were missing in the MODIS product due to its coarse spatial resolution. The data presented here can be useful for earth science and ecosystem services research.
Resumo:
The 'Paleocene/Eocene Thermal Maximum' or PETM (~55 Ma) was associated with dramatic warming of the oceans and atmosphere, pronounced changes in ocean circulation and chemistry, and upheaval of the global carbon cycle. Many relatively complete PETM sequences have by now been reported from around the world, but most are from ancient low- to midlatitude sites. ODP Leg 189 in the Tasman Sea recovered sediments from this critical phase in Earth history at Sites 1171 and 1172, potentially representing the southernmost PETM successions ever encountered (at ~70° to 65° S paleolatitude). Downhole and core logging data, in combination with dinoflagellate cyst biostratigraphy, magneto-stratigraphy, and stable isotope geochemistry indicate that the sequences at both sites were deposited in a high accumulation-rate, organic rich, marginal marine setting. Furthermore, Site 1172 indeed contains a fairly complete P-E transition, whereas at Site 1171, only the lowermost Eocene is recovered. However, at Site 1172, the typical PETM-indicative acme of the dinocyst Apectodinium was not recorded. We conclude that unfortunately, the critical latest Paleocene and PETM intervals are missing at Site 1172. We relate the missing section to a sea level driven hiatus and/or condensed section and recovery problems. Nevertheless, our integrated records provide a first-ever portrait of the trend toward, and aftermath of, the PETM in a marginal marine, southern high-latitude setting.
Resumo:
The distribution of dissolved aluminium in the West Atlantic Ocean shows a mirror image with that of dissolved silicic acid, hinting at intricate interactions between the ocean cycling of Al and Si. The marine biogeochemistry of Al is of interest because of its potential impact on diatom opal remineralisation, hence Si availability. Furthermore, the dissolved Al concentration at the surface ocean has been used as a tracer for dust input, dust being the most important source of the bio-essential trace element iron to the ocean. Previously, the dissolved concentration of Al was simulated reasonably well with only a dust source, and scavenging by adsorption on settling biogenic debris as the only removal process. Here we explore the impacts of (i) a sediment source of Al in the Northern Hemisphere (especially north of ~ 40° N), (ii) the imposed velocity field, and (iii) biological incorporation of Al on the modelled Al distribution in the ocean. The sediment source clearly improves the model results, and using a different velocity field shows the importance of advection on the simulated Al distribution. Biological incorporation appears to be a potentially important removal process. However, conclusive independent data to constrain the Al / Si incorporation ratio by growing diatoms are missing. Therefore, this study does not provide a definitive answer to the question of the relative importance of Al removal by incorporation compared to removal by adsorptive scavenging.
Resumo:
The nature of Re-platinum-group element (PGE; Pt, Pd, Ir, Os, Ru) transport in the marine environment was investigated by means of marine sediments at and across the Cretaceous-Tertiary boundary (KTB) at two hemipelagic sites in Europe and two pelagic sites in the North and South Pacific. A traverse across the KTB in the South Pacific pelagic clay core found elevated levels of Re, Pt, Ir, Os, and Ru, each of which is approximately symmetrically distributed over a distance of ~1.8 m across the KTB. The Re-PGE abundance patterns are fractionated from chondritic relative abundances: Ru, Pt, Pd, and Re contents are slightly subchondritic relative to Ir, and Os is depleted by ~95% relative to chondritic Ir proportions. A similar depletion in Os (~90%) was found in a sample of the pelagic KTB in the North Pacific, but it is enriched in Ru, Pt, Pd, and Re relative to Ir. The two hemipelagic KTB clays have near-chondritic abundance patterns. The ~1.8-m-wide Re-PGE peak in the pelagic South Pacific section cannot be reconciled with the fallout of a single impactor, indicating that postdepositional redistribution has occurred. The elemental profiles appear to fit diffusion profiles, although bioturbation could have also played a role. If diffusion had occurred over ~65 Ma, the effective diffusivities are ~10**?13 cm**2/s, much smaller than that of soluble cations in pore waters (~10**?6 cm**2/s). The coupling of Re and the PGEs during redistribution indicates that postdepositional processes did not significantly fractionate their relative abundances. If redistribution was caused by diffusion, then the effective diffusivities are the same. Fractionation of Os from Ir during the KTB interval must therefore have occurred during aqueous transport in the marine environment. Distinctly subchondritic Os/Ir ratios throughout the Cenozoic in the South Pacific core further suggest that fractionation of Os from Ir in the marine environment is a general process throughout geologic time because most of the inputs of Os and Ir into the ocean have Os/Ir ratios >/=1. Mass balance calculations show that Os and Re burial fluxes in pelagic sediments account for only a small fraction of the riverine Os (<10%) and Re (<0.1%) inputs into the oceans. In contrast, burial of Ir in pelagic sediments is similar to the riverine Ir input, indicating that pelagic sediments are a much larger repository for Ir than for Os and Re. If all of the missing Os and Re is assumed to reside in anoxic sediments in oceanic margins, the calculated burial fluxes in anoxic sediments are similar to observed burial fluxes. However, putting all of the missing Os and Re into estuarine sediments would require high concentrations to balance the riverine input and would also fail to explain the depletion of Os at pelagic KTB sites, where at most ~25% of the K-T impactor's Os could have passed through estuaries. If Os is preferentially sequestered in anoxic marine environments, it follows that the Os/Ir ratio of pelagic sediments should be sensitive to changes in the rates of anoxic sediment deposition. There is thus a clear fractionation of Os and Re from Ir in precipitation out of sea water in pelagic sections. Accordingly, it is inferred here that Re and Os are removed from sea water in anoxic marine depositional regimes.
Resumo:
The Linked Data initiative offers a straight method to publish structured data in the World Wide Web and link it to other data, resulting in a world wide network of semantically codified data known as the Linked Open Data cloud. The size of the Linked Open Data cloud, i.e. the amount of data published using Linked Data principles, is growing exponentially, including life sciences data. However, key information for biological research is still missing in the Linked Open Data cloud. For example, the relation between orthologs genes and genetic diseases is absent, even though such information can be used for hypothesis generation regarding human diseases. The OGOLOD system, an extension of the OGO Knowledge Base, publishes orthologs/diseases information using Linked Data. This gives the scientists the ability to query the structured information in connection with other Linked Data and to discover new information related to orthologs and human diseases in the cloud.
Resumo:
Hoy en día, con la evolución continua y rápida de las tecnologías de la información y los dispositivos de computación, se recogen y almacenan continuamente grandes volúmenes de datos en distintos dominios y a través de diversas aplicaciones del mundo real. La extracción de conocimiento útil de una cantidad tan enorme de datos no se puede realizar habitualmente de forma manual, y requiere el uso de técnicas adecuadas de aprendizaje automático y de minería de datos. La clasificación es una de las técnicas más importantes que ha sido aplicada con éxito a varias áreas. En general, la clasificación se compone de dos pasos principales: en primer lugar, aprender un modelo de clasificación o clasificador a partir de un conjunto de datos de entrenamiento, y en segundo lugar, clasificar las nuevas instancias de datos utilizando el clasificador aprendido. La clasificación es supervisada cuando todas las etiquetas están presentes en los datos de entrenamiento (es decir, datos completamente etiquetados), semi-supervisada cuando sólo algunas etiquetas son conocidas (es decir, datos parcialmente etiquetados), y no supervisada cuando todas las etiquetas están ausentes en los datos de entrenamiento (es decir, datos no etiquetados). Además, aparte de esta taxonomía, el problema de clasificación se puede categorizar en unidimensional o multidimensional en función del número de variables clase, una o más, respectivamente; o también puede ser categorizado en estacionario o cambiante con el tiempo en función de las características de los datos y de la tasa de cambio subyacente. A lo largo de esta tesis, tratamos el problema de clasificación desde tres perspectivas diferentes, a saber, clasificación supervisada multidimensional estacionaria, clasificación semisupervisada unidimensional cambiante con el tiempo, y clasificación supervisada multidimensional cambiante con el tiempo. Para llevar a cabo esta tarea, hemos usado básicamente los clasificadores Bayesianos como modelos. La primera contribución, dirigiéndose al problema de clasificación supervisada multidimensional estacionaria, se compone de dos nuevos métodos de aprendizaje de clasificadores Bayesianos multidimensionales a partir de datos estacionarios. Los métodos se proponen desde dos puntos de vista diferentes. El primer método, denominado CB-MBC, se basa en una estrategia de envoltura de selección de variables que es voraz y hacia delante, mientras que el segundo, denominado MB-MBC, es una estrategia de filtrado de variables con una aproximación basada en restricciones y en el manto de Markov. Ambos métodos han sido aplicados a dos problemas reales importantes, a saber, la predicción de los inhibidores de la transcriptasa inversa y de la proteasa para el problema de infección por el virus de la inmunodeficiencia humana tipo 1 (HIV-1), y la predicción del European Quality of Life-5 Dimensions (EQ-5D) a partir de los cuestionarios de la enfermedad de Parkinson con 39 ítems (PDQ-39). El estudio experimental incluye comparaciones de CB-MBC y MB-MBC con los métodos del estado del arte de la clasificación multidimensional, así como con métodos comúnmente utilizados para resolver el problema de predicción de la enfermedad de Parkinson, a saber, la regresión logística multinomial, mínimos cuadrados ordinarios, y mínimas desviaciones absolutas censuradas. En ambas aplicaciones, los resultados han sido prometedores con respecto a la precisión de la clasificación, así como en relación al análisis de las estructuras gráficas que identifican interacciones conocidas y novedosas entre las variables. La segunda contribución, referida al problema de clasificación semi-supervisada unidimensional cambiante con el tiempo, consiste en un método nuevo (CPL-DS) para clasificar flujos de datos parcialmente etiquetados. Los flujos de datos difieren de los conjuntos de datos estacionarios en su proceso de generación muy rápido y en su aspecto de cambio de concepto. Es decir, los conceptos aprendidos y/o la distribución subyacente están probablemente cambiando y evolucionando en el tiempo, lo que hace que el modelo de clasificación actual sea obsoleto y deba ser actualizado. CPL-DS utiliza la divergencia de Kullback-Leibler y el método de bootstrapping para cuantificar y detectar tres tipos posibles de cambio: en las predictoras, en la a posteriori de la clase o en ambas. Después, si se detecta cualquier cambio, un nuevo modelo de clasificación se aprende usando el algoritmo EM; si no, el modelo de clasificación actual se mantiene sin modificaciones. CPL-DS es general, ya que puede ser aplicado a varios modelos de clasificación. Usando dos modelos diferentes, el clasificador naive Bayes y la regresión logística, CPL-DS se ha probado con flujos de datos sintéticos y también se ha aplicado al problema real de la detección de código malware, en el cual los nuevos ficheros recibidos deben ser continuamente clasificados en malware o goodware. Los resultados experimentales muestran que nuestro método es efectivo para la detección de diferentes tipos de cambio a partir de los flujos de datos parcialmente etiquetados y también tiene una buena precisión de la clasificación. Finalmente, la tercera contribución, sobre el problema de clasificación supervisada multidimensional cambiante con el tiempo, consiste en dos métodos adaptativos, a saber, Locally Adpative-MB-MBC (LA-MB-MBC) y Globally Adpative-MB-MBC (GA-MB-MBC). Ambos métodos monitorizan el cambio de concepto a lo largo del tiempo utilizando la log-verosimilitud media como métrica y el test de Page-Hinkley. Luego, si se detecta un cambio de concepto, LA-MB-MBC adapta el actual clasificador Bayesiano multidimensional localmente alrededor de cada nodo cambiado, mientras que GA-MB-MBC aprende un nuevo clasificador Bayesiano multidimensional. El estudio experimental realizado usando flujos de datos sintéticos multidimensionales indica los méritos de los métodos adaptativos propuestos. ABSTRACT Nowadays, with the ongoing and rapid evolution of information technology and computing devices, large volumes of data are continuously collected and stored in different domains and through various real-world applications. Extracting useful knowledge from such a huge amount of data usually cannot be performed manually, and requires the use of adequate machine learning and data mining techniques. Classification is one of the most important techniques that has been successfully applied to several areas. Roughly speaking, classification consists of two main steps: first, learn a classification model or classifier from an available training data, and secondly, classify the new incoming unseen data instances using the learned classifier. Classification is supervised when the whole class values are present in the training data (i.e., fully labeled data), semi-supervised when only some class values are known (i.e., partially labeled data), and unsupervised when the whole class values are missing in the training data (i.e., unlabeled data). In addition, besides this taxonomy, the classification problem can be categorized into uni-dimensional or multi-dimensional depending on the number of class variables, one or more, respectively; or can be also categorized into stationary or streaming depending on the characteristics of the data and the rate of change underlying it. Through this thesis, we deal with the classification problem under three different settings, namely, supervised multi-dimensional stationary classification, semi-supervised unidimensional streaming classification, and supervised multi-dimensional streaming classification. To accomplish this task, we basically used Bayesian network classifiers as models. The first contribution, addressing the supervised multi-dimensional stationary classification problem, consists of two new methods for learning multi-dimensional Bayesian network classifiers from stationary data. They are proposed from two different points of view. The first method, named CB-MBC, is based on a wrapper greedy forward selection approach, while the second one, named MB-MBC, is a filter constraint-based approach based on Markov blankets. Both methods are applied to two important real-world problems, namely, the prediction of the human immunodeficiency virus type 1 (HIV-1) reverse transcriptase and protease inhibitors, and the prediction of the European Quality of Life-5 Dimensions (EQ-5D) from 39-item Parkinson’s Disease Questionnaire (PDQ-39). The experimental study includes comparisons of CB-MBC and MB-MBC against state-of-the-art multi-dimensional classification methods, as well as against commonly used methods for solving the Parkinson’s disease prediction problem, namely, multinomial logistic regression, ordinary least squares, and censored least absolute deviations. For both considered case studies, results are promising in terms of classification accuracy as well as regarding the analysis of the learned MBC graphical structures identifying known and novel interactions among variables. The second contribution, addressing the semi-supervised uni-dimensional streaming classification problem, consists of a novel method (CPL-DS) for classifying partially labeled data streams. Data streams differ from the stationary data sets by their highly rapid generation process and their concept-drifting aspect. That is, the learned concepts and/or the underlying distribution are likely changing and evolving over time, which makes the current classification model out-of-date requiring to be updated. CPL-DS uses the Kullback-Leibler divergence and bootstrapping method to quantify and detect three possible kinds of drift: feature, conditional or dual. Then, if any occurs, a new classification model is learned using the expectation-maximization algorithm; otherwise, the current classification model is kept unchanged. CPL-DS is general as it can be applied to several classification models. Using two different models, namely, naive Bayes classifier and logistic regression, CPL-DS is tested with synthetic data streams and applied to the real-world problem of malware detection, where the new received files should be continuously classified into malware or goodware. Experimental results show that our approach is effective for detecting different kinds of drift from partially labeled data streams, as well as having a good classification performance. Finally, the third contribution, addressing the supervised multi-dimensional streaming classification problem, consists of two adaptive methods, namely, Locally Adaptive-MB-MBC (LA-MB-MBC) and Globally Adaptive-MB-MBC (GA-MB-MBC). Both methods monitor the concept drift over time using the average log-likelihood score and the Page-Hinkley test. Then, if a drift is detected, LA-MB-MBC adapts the current multi-dimensional Bayesian network classifier locally around each changed node, whereas GA-MB-MBC learns a new multi-dimensional Bayesian network classifier from scratch. Experimental study carried out using synthetic multi-dimensional data streams shows the merits of both proposed adaptive methods.
Resumo:
Surmises of how myosin subfragment 1 (S1) interacts with actin filaments in muscle contraction rest upon knowing the relative arrangement of the two proteins. Although there exist crystallographic structures for both S1 and actin, as well as electron microscopy data for the acto–S1 complex (AS1), modeling of this arrangement has so far only been done “by eye.” Here we report fitted AS1 structures obtained using a quantitative method that is both more objective and makes more complete use of the data. Using undistorted crystallographic results, the best-fit AS1 structure shows significant differences from that obtained by visual fitting. The best fit is produced using the F-actin model of Holmes et al. [Holmes, K. C., Popp, D., Gebhard, W. & Kabsch, W. (1990) Nature (London) 347, 44–49]. S1 residues at the AS1 interface are now found at a higher radius as well as being translated axially and rotated azimuthally. Fits using S1 plus loops missing from the crystal structure were achieved using a homology search method to predict loop structures. These improved fits favor an arrangement in which the loop at the 50- to 20-kDa domain junction of S1 is located near the N terminus of actin. Rigid-body movements of the lower 50-kDa domain, which further improve the fit, produce closure of the large 50-kDa domain cleft and bring conserved residues in the lower 50-kDa domain into an apparently appropriate orientation for close interaction with actin. This finding supports the idea that binding of ATP to AS1 at the end of the ATPase cycle disrupts the actin binding site by changing the conformation of the 50-kDa cleft of S1.
Resumo:
We report the identification and cloning of a 28-kDa polypeptide (p28) in Tetrahymena macronuclei that shares several features with the well studied heterochromatin-associated protein HP1 from Drosophila. Notably, like HP1, p28 contains both a chromodomain and a chromoshadow domain. p28 also shares features with linker histone H1, and like H1, p28 is multiply phosphorylated, at least in part, by a proline-directed, Cdc2-type kinase. As such, p28 is referred to as Hhp1p (for H1/HP1-like protein). Hhp1p is missing from transcriptionally silent micronuclei but is enriched in heterochromatin-like chromatin bodies that presumably comprise repressed chromatin in macronuclei. These findings shed light on the evolutionary conserved nature of heterochromatin in organisms ranging from ciliates to humans and provide further evidence that HP1-like proteins are not exclusively associated with permanently silent chromosomal domains. Our data support a view that members of this family also associate with repressed states of euchromatin.
Resumo:
Molecular and fragment ion data of intact 8- to 43-kDa proteins from electrospray Fourier-transform tandem mass spectrometry are matched against the corresponding data in sequence data bases. Extending the sequence tag concept of Mann and Wilm for matching peptides, a partial amino acid sequence in the unknown is first identified from the mass differences of a series of fragment ions, and the mass position of this sequence is defined from molecular weight and the fragment ion masses. For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags. However, three of the data base proteins differed by having an extra methionine or by missing an acetyl or heme substitution. The positions of these modifications in the protein examined were greatly restricted by the mass differences of its molecular and fragment ions versus those of the data base. To characterize the primary structure of an unknown represented in the data base, this method is fast and specific and does not require prior enzymatic or chemical degradation.