844 resultados para Local classification method
Resumo:
BACKGROUND Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A solution to protect privacy in probabilistic record linkages is to encrypt these sensitive information. Unfortunately, encrypted hash codes of two names differ completely if the plain names differ only by a single character. Therefore, standard encryption methods cannot be applied. To overcome these challenges, we developed the Privacy Preserving Probabilistic Record Linkage (P3RL) method. METHODS In this Privacy Preserving Probabilistic Record Linkage method we apply a three-party protocol, with two sites collecting individual data and an independent trusted linkage center as the third partner. Our method consists of three main steps: pre-processing, encryption and probabilistic record linkage. Data pre-processing and encryption are done at the sites by local personnel. To guarantee similar quality and format of variables and identical encryption procedure at each site, the linkage center generates semi-automated pre-processing and encryption templates. To retrieve information (i.e. data structure) for the creation of templates without ever accessing plain person identifiable information, we introduced a novel method of data masking. Sensitive string variables are encrypted using Bloom filters, which enables calculation of similarity coefficients. For date variables, we developed special encryption procedures to handle the most common date errors. The linkage center performs probabilistic record linkage with encrypted person identifiable information and plain non-sensitive variables. RESULTS In this paper we describe step by step how to link existing health-related data using encryption methods to preserve privacy of persons in the study. CONCLUSION Privacy Preserving Probabilistic Record linkage expands record linkage facilities in settings where a unique identifier is unavailable and/or regulations restrict access to the non-unique person identifiable information needed to link existing health-related data sets. Automated pre-processing and encryption fully protect sensitive information ensuring participant confidentiality. This method is suitable not just for epidemiological research but also for any setting with similar challenges.
Resumo:
The long-term integrity of protected areas (PAs), and hence the maintenance of related ecosystem services (ES), are dependent on the support of local people. In the present study, local people's perceptions of ecosystem services from PAs and factors that govern local preferences for PAs are assessed. Fourteen study villages were randomly selected from three different protected forest areas and one control site along the southern coast of Côte d'Ivoire. Data was collected through a mixed-method approach, including qualitative semi-structured interviews and a household survey based on hypothetical choice scenarios. Local people's perceptions of ecosystem service provision was decrypted through qualitative content analysis, while the relation between people's preferences and potential factors that affect preferences were analyzed through multinomial models. This study shows that rural villagers do perceive a number of different ecosystem services as benefits from PAs in Côte d'Ivoire. The results based on quantitative data also suggest that local preferences for PAs and related ecosystem services are driven by PAs' management rules, age, and people's dependence on natural resources.
Resumo:
Using the asymptotic form of the bulk Weyl tensor, we present an explicit approach that allows us to reconstruct exact four-dimensional Einstein spacetimes which are algebraically special with respect to Petrov’s classification. If the boundary metric supports a traceless, symmetric and conserved complex rank-two tensor, which is related to the boundary Cotton and energy-momentum tensors, and if the hydrodynamic congruence is shearless, then the bulk metric is exactly resummed and captures modes that stand beyond the hydrodynamic derivative expansion. We illustrate the method when the congruence has zero vorticity, leading to the Robinson-Trautman spacetimes of arbitrary Petrov class, and quote the case of non-vanishing vorticity, which captures the Plebański-Demiański Petrov D family.
Resumo:
Optimal adjustment of brain networks allows the biased processing of information in response to the demand of environments and is therefore prerequisite for adaptive behaviour. It is widely shown that a biased state of networks is associated with a particular cognitive process. However, those associations were identified by backward categorization of trials and cannot provide a causal association with cognitive processes. This problem still remains a big obstacle to advance the state of our field in particular human cognitive neuroscience. In my talk, I will present two approaches to address the causal relationships between brain network interactions and behaviour. Firstly, we combined connectivity analysis of fMRI data and a machine leaning method to predict inter-individual differences of behaviour and responsiveness to environmental demands. The connectivity-based classification approach outperforms local activation-based classification analysis, suggesting that interactions in brain networks carry information of instantaneous cognitive processes. Secondly, we have recently established a brand new method combining transcranial alternating current stimulation (tACS), transcranial magnetic stimulation (TMS), and EEG. We use the method to measure signal transmission between brain areas while introducing extrinsic oscillatory brain activity and to study causal association between oscillatory activity and behaviour. We show that phase-matched oscillatory activity creates the phase-dependent modulation of signal transmission between brain areas, while phase-shifted oscillatory activity blunts the phase-dependent modulation. The results suggest that phase coherence between brain areas plays a cardinal role in signal transmission in the brain networks. In sum, I argue that causal approaches will provide more concreate backbones to cognitive neuroscience.
Resumo:
We present a novel surrogate model-based global optimization framework allowing a large number of function evaluations. The method, called SpLEGO, is based on a multi-scale expected improvement (EI) framework relying on both sparse and local Gaussian process (GP) models. First, a bi-objective approach relying on a global sparse GP model is used to determine potential next sampling regions. Local GP models are then constructed within each selected region. The method subsequently employs the standard expected improvement criterion to deal with the exploration-exploitation trade-off within selected local models, leading to a decision on where to perform the next function evaluation(s). The potential of our approach is demonstrated using the so-called Sparse Pseudo-input GP as a global model. The algorithm is tested on four benchmark problems, whose number of starting points ranges from 102 to 104. Our results show that SpLEGO is effective and capable of solving problems with large number of starting points, and it even provides significant advantages when compared with state-of-the-art EI algorithms.
Lung Pattern Classification for Interstitial Lung Diseases Using a Deep Convolutional Neural Network
Resumo:
Automated tissue characterization is one of the most crucial components of a computer aided diagnosis (CAD) system for interstitial lung diseases (ILDs). Although much research has been conducted in this field, the problem remains challenging. Deep learning techniques have recently achieved impressive results in a variety of computer vision problems, raising expectations that they might be applied in other domains, such as medical image analysis. In this paper, we propose and evaluate a convolutional neural network (CNN), designed for the classification of ILD patterns. The proposed network consists of 5 convolutional layers with 2×2 kernels and LeakyReLU activations, followed by average pooling with size equal to the size of the final feature maps and three dense layers. The last dense layer has 7 outputs, equivalent to the classes considered: healthy, ground glass opacity (GGO), micronodules, consolidation, reticulation, honeycombing and a combination of GGO/reticulation. To train and evaluate the CNN, we used a dataset of 14696 image patches, derived by 120 CT scans from different scanners and hospitals. To the best of our knowledge, this is the first deep CNN designed for the specific problem. A comparative analysis proved the effectiveness of the proposed CNN against previous methods in a challenging dataset. The classification performance (~85.5%) demonstrated the potential of CNNs in analyzing lung patterns. Future work includes, extending the CNN to three-dimensional data provided by CT volume scans and integrating the proposed method into a CAD system that aims to provide differential diagnosis for ILDs as a supportive tool for radiologists.
Resumo:
Historically morphological features were used as the primary means to classify organisms. However, the age of molecular genetics has allowed us to approach this field from the perspective of the organism's genetic code. Early work used highly conserved sequences, such as ribosomal RNA. The increasing number of complete genomes in the public data repositories provides the opportunity to look not only at a single gene, but at organisms' entire parts list. ^ Here the Sequence Comparison Index (SCI) and the Organism Comparison Index (OCI), algorithms and methods to compare proteins and proteomes, are presented. The complete proteomes of 104 sequenced organisms were compared. Over 280 million full Smith-Waterman alignments were performed on sequence pairs which had a reasonable expectation of being related. From these alignments a whole proteome phylogenetic tree was constructed. This method was also used to compare the small subunit (SSU) rRNA from each organism and a tree constructed from these results. The SSU rRNA tree by the SCI/OCI method looks very much like accepted SSU rRNA trees from sources such as the Ribosomal Database Project, thus validating the method. The SCI/OCI proteome tree showed a number of small but significant differences when compared to the SSU rRNA tree and proteome trees constructed by other methods. Horizontal gene transfer does not appear to affect the SCI/OCI trees until the transferred genes make up a large portion of the proteome. ^ As part of this work, the Database of Related Local Alignments (DaRLA) was created and contains over 81 million rows of sequence alignment information. DaRLA, while primarily used to build the whole proteome trees, can also be applied shared gene content analysis, gene order analysis, and creating individual protein trees. ^ Finally, the standard BLAST method for analyzing shared gene content was compared to the SCI method using 4 spirochetes. The SCI system performed flawlessly, finding all proteins from one organism against itself and finding all the ribosomal proteins between organisms. The BLAST system missed some proteins from its respective organism and failed to detect small ribosomal proteins between organisms. ^
Resumo:
The Houston region is home to arguably the largest petrochemical and refining complex anywhere. The effluent of this complex includes many potentially hazardous compounds. Study of some of these compounds has led to recognition that a number of known and probable carcinogens are at elevated levels in ambient air. Two of these, benzene and 1,3-butadiene, have been found in concentrations which may pose health risk for residents of Houston.^ Recent popular journalism and publications by local research institutions has increased the interest of the public in Houston's air quality. Much of the literature has been critical of local regulatory agencies' oversight of industrial pollution. A number of citizens in the region have begun to volunteer with air quality advocacy groups in the testing of community air. Inexpensive methods exist for monitoring of ozone, particulate matter and airborne toxic ambient concentrations. This study is an evaluation of a technique that has been successfully applied to airborne toxics.^ This technique, solid phase microextraction (SPME), has been used to measure airborne volatile organic hydrocarbons at community-level concentrations. It is has yielded accurate and rapid concentration estimates at a relatively low cost per sample. Examples of its application to measurement of airborne benzene exist in the literature. None have been found for airborne 1,3-butadiene. These compounds were selected for an evaluation of SPME as a community-deployed technique, to replicate previous application to benzene, to expand application to 1,3-butadiene and due to the salience of these compounds in this community. ^ This study demonstrates that SPME is a useful technique for quantification of 1,3-butadiene at concentrations observed in Houston. Laboratory background levels precluded recommendation of the technique for benzene. One type of SPME fiber, 85 μm Carboxen/PDMS, was found to be a sensitive sampling device for 1,3-butadiene under temperature and humidity conditions common in Houston. This study indicates that these variables affect instrument response. This suggests the necessity of calibration within specific conditions of these variables. While deployment of this technique was less expensive than other methods of quantification of 1,3-butadiene, the complexity of calibration may exclude an SPME method from broad deployment by community groups.^
Resumo:
My dissertation focuses on two aspects of RNA sequencing technology. The first is the methodology for modeling the overdispersion inherent in RNA-seq data for differential expression analysis. This aspect is addressed in three sections. The second aspect is the application of RNA-seq data to identify the CpG island methylator phenotype (CIMP) by integrating datasets of mRNA expression level and DNA methylation status. Section 1: The cost of DNA sequencing has reduced dramatically in the past decade. Consequently, genomic research increasingly depends on sequencing technology. However it remains elusive how the sequencing capacity influences the accuracy of mRNA expression measurement. We observe that accuracy improves along with the increasing sequencing depth. To model the overdispersion, we use the beta-binomial distribution with a new parameter indicating the dependency between overdispersion and sequencing depth. Our modified beta-binomial model performs better than the binomial or the pure beta-binomial model with a lower false discovery rate. Section 2: Although a number of methods have been proposed in order to accurately analyze differential RNA expression on the gene level, modeling on the base pair level is required. Here, we find that the overdispersion rate decreases as the sequencing depth increases on the base pair level. Also, we propose four models and compare them with each other. As expected, our beta binomial model with a dynamic overdispersion rate is shown to be superior. Section 3: We investigate biases in RNA-seq by exploring the measurement of the external control, spike-in RNA. This study is based on two datasets with spike-in controls obtained from a recent study. We observe an undiscovered bias in the measurement of the spike-in transcripts that arises from the influence of the sample transcripts in RNA-seq. Also, we find that this influence is related to the local sequence of the random hexamer that is used in priming. We suggest a model of the inequality between samples and to correct this type of bias. Section 4: The expression of a gene can be turned off when its promoter is highly methylated. Several studies have reported that a clear threshold effect exists in gene silencing that is mediated by DNA methylation. It is reasonable to assume the thresholds are specific for each gene. It is also intriguing to investigate genes that are largely controlled by DNA methylation. These genes are called “L-shaped” genes. We develop a method to determine the DNA methylation threshold and identify a new CIMP of BRCA. In conclusion, we provide a detailed understanding of the relationship between the overdispersion rate and sequencing depth. And we reveal a new bias in RNA-seq and provide a detailed understanding of the relationship between this new bias and the local sequence. Also we develop a powerful method to dichotomize methylation status and consequently we identify a new CIMP of breast cancer with a distinct classification of molecular characteristics and clinical features.
Resumo:
In this study multibeam angular backscatter data acquired in the eastern slope of the Porcupine Seabight are analysed. Processing of the angular backscatter data using the 'NRGCOR' software was made for 29 locations comprising different geological provinces like: carbonate mounds, buried mounds, seafloor channels, and inter-channel areas. A detailed methodology is developed to produce a map of angle-invariant (normalized) backscatter data by correcting the local angular backscatter values. The present paper involves detailed processing steps and related technical aspects of the normalization approach. The presented angle-invariant backscatter map possesses 12 dB dynamic range in terms of grey scale. A clear distinction is seen between the mound dominated northern area (Belgica province) and the Gollum channel seafloor at the southern end of the site. Qualitative analyses of the calculated mean backscatter values i.e., grey scale levels, utilizing angle-invariant backscatter data generally indicate backscatter values are highest (lighter grey scale) in the mound areas followed by buried mounds. The backscatter values are lowest in the inter-channel areas (lowest grey scale level). Moderate backscatter values (medium grey level) are observed from the Gollum and Kings channel data, and significant variability within the channel seafloor provinces. The segmentation of the channel seafloor provinces are made based on the computed grey scale levels for further analyses based on the angular backscatter strength. Three major parameters are utilized to classify four different seafloor provinces of the Porcupine Seabight by employing a semi-empirical method to analyse multibeam angular backscatter data. The predicted backscatter response which has been computed at 20° is the highest for the mound areas. The coefficient of variation (CV) of the mean backscatter response is also the highest for the mound areas. Interestingly, the slope value of the buried mound areas are found to be the highest. However, the channel seafloor of moderate backscatter response presents the lowest slope and CV values. A critical examination of the inter-channel areas indicates less variability within the estimated three parameters.
Resumo:
This study subdivides the Potter Cove, King George Island, Antarctica, into seafloor regions using multivariate statistical methods. These regions are categories used for comparing, contrasting and quantifying biogeochemical processes and biodiversity between ocean regions geographically but also regions under development within the scope of global change. The division obtained is characterized by the dominating components and interpreted in terms of ruling environmental conditions. The analysis includes in total 42 different environmental variables, interpolated based on samples taken during Australian summer seasons 2010/2011 and 2011/2012. The statistical errors of several interpolation methods (e.g. IDW, Indicator, Ordinary and Co-Kriging) with changing settings have been compared and the most reasonable method has been applied. The multivariate mathematical procedures used are regionalized classification via k means cluster analysis, canonical-correlation analysis and multidimensional scaling. Canonical-correlation analysis identifies the influencing factors in the different parts of the cove. Several methods for the identification of the optimum number of clusters have been tested and 4, 7, 10 as well as 12 were identified as reasonable numbers for clustering the Potter Cove. Especially the results of 10 and 12 clusters identify marine-influenced regions which can be clearly separated from those determined by the geological catchment area and the ones dominated by river discharge.
Resumo:
Large-scale environmental patterns in the Humboldt Current System (HCS) show major changes during strong El Niño episodes, leading to the mass mortality of dominant species in coastal ecosystems. Here we explore how these changes affect the life-history traits of the surf clam Mesodesma donacium. Growth and mortality rates under normal temperature and salinity were compared to those under anomalous (El Niño) higher temperature and reduced salinity. Moreover, the reproductive spatial-temporal patterns along the distribution range were studied, and their relationship to large-scale environmental variability was assessed. M. donacium is highly sensitive to temperature changes, supporting the hypothesis of temperature as the key factor leading to mass mortality events of this clam in northern populations. In contrast, this species, particularly juveniles, was remarkably tolerant to low salinity, which may be related to submarine groundwater discharge in Hornitos, northern Chile. The enhanced osmotic tolerance by juveniles may represent an adaptation of early life stages allowing settlement in vacant areas at outlets of estuarine areas. The strong seasonality in freshwater input and in upwelling strength seems to be linked to the spatial and temporal patterns in the reproductive cycle. Owing to its origin and thermal sensitivity, the expansion and dominance of M. donacium from the Pliocene/Pleistocene transition until the present seem closely linked to the establishment and development of the cold HCS. Therefore, the recurrence of warming events (particularly El Niño since at least the Holocene) has submitted this cold-water species to a continuous local extinction-recolonization process.
Resumo:
The major-element and most of the trace-element data from the different laboratories that contributed to the study of samples recovered during Leg 82 are presented in the following tables. The different basalt groups, identified on the basis of their chemical properties (major and trace elements), were defined from the data available on board the Glomar Challenger as the cruise progressed (see site chapters, all sites, this volume). Most of the data obtained since the end of the cruise and presented in these tables confirm the classification that was proposed by the shipboard party (see site chapters, all sites, this volume). Nevertheless, special mention should be made about Site 564. The shipboard party proposed a single chemical group at this site but noticed significant variations down the hole, mainly in trace-element data. However, the range of variation was small compared to the precision of the measurements. These variations were confirmed by the onshore studies (see papers in Part IV of this volume, especially Brannon's paper, partly devoted to this topic).
Resumo:
This study describes detailed partitioning of phytomass carbon (C) and soil organic carbon (SOC) for four study areas in discontinuous permafrost terrain, Northeast European Russia. The mean aboveground phytomass C storage is 0.7 kg C/m**2. Estimated landscape SOC storage in the four areas varies between 34.5 and 47.0 kg C/m**2 with LCC (land cover classification) upscaling and 32.5-49.0 kg C/m**2 with soil map upscaling. A nested upscaling approach using a Landsat thematic mapper land cover classification for the surrounding region provides estimates within 5 ± 5% of the local high-resolution estimates. Permafrost peat plateaus hold the majority of total and frozen SOC, especially in the more southern study areas. Burying of SOC through cryoturbation of O- or A-horizons contributes between 1% and 16% (mean 5%) of total landscape SOC. The effect of active layer deepening and thermokarst expansion on SOC remobilization is modeled for one of the four areas. The active layer thickness dynamics from 1980 to 2099 is modeled using a transient spatially distributed permafrost model and lateral expansion of peat plateau thermokarst lakes is simulated using geographic information system analyses. Active layer deepening is expected to increase the proportion of SOC affected by seasonal thawing from 29% to 58%. A lateral expansion of 30 m would increase the amount of SOC stored in thermokarst lakes/fens from 2% to 22% of all SOC. By the end of this century, active layer deepening will likely affect more SOC than thermokarst expansion, but the SOC stores vulnerable to thermokarst are less decomposed.
Resumo:
Locally weighted regression is a technique that predicts the response for new data items from their neighbors in the training data set, where closer data items are assigned higher weights in the prediction. However, the original method may suffer from overfitting and fail to select the relevant variables. In this paper we propose combining a regularization approach with locally weighted regression to achieve sparse models. Specifically, the lasso is a shrinkage and selection method for linear regression. We present an algorithm that embeds lasso in an iterative procedure that alternatively computes weights and performs lasso-wise regression. The algorithm is tested on three synthetic scenarios and two real data sets. Results show that the proposed method outperforms linear and local models for several kinds of scenarios