902 resultados para Graph-based methods
Resumo:
The realization that statistical physics methods can be applied to analyze written texts represented as complex networks has led to several developments in natural language processing, including automatic summarization and evaluation of machine translation. Most importantly, so far only a few metrics of complex networks have been used and therefore there is ample opportunity to enhance the statistics-based methods as new measures of network topology and dynamics are created. In this paper, we employ for the first time the metrics betweenness, vulnerability and diversity to analyze written texts in Brazilian Portuguese. Using strategies based on diversity metrics, a better performance in automatic summarization is achieved in comparison to previous work employing complex networks. With an optimized method the Rouge score (an automatic evaluation method used in summarization) was 0.5089, which is the best value ever achieved for an extractive summarizer with statistical methods based on complex networks for Brazilian Portuguese. Furthermore, the diversity metric can detect keywords with high precision, which is why we believe it is suitable to produce good summaries. It is also shown that incorporating linguistic knowledge through a syntactic parser does enhance the performance of the automatic summarizers, as expected, but the increase in the Rouge score is only minor. These results reinforce the suitability of complex network methods for improving automatic summarizers in particular, and treating text in general. (C) 2011 Elsevier B.V. All rights reserved.
Resumo:
During the last three decades, several predictive models have been developed to estimate the somatic production of macroinvertebrates. Although the models have been evaluated for their ability to assess the production of macrobenthos in different marine ecosystems, these approaches have not been applied specifically to sandy beach macrofauna and may not be directly applicable to this transitional environment. Hence, in this study, a broad literature review of sandy beach macrofauna production was conducted and estimates obtained with cohort-based and size-based methods were collected. The performance of nine models in estimating the production of individual populations from the sandy beach environment, evaluated for all taxonomic groups combined and for individual groups separately, was assessed, comparing the production predicted by the models to the estimates obtained from the literature (observed production). Most of the models overestimated population production compared to observed production estimates, whether for all populations combined or more specific taxonomic groups. However, estimates by two models developed by Cusson and Bourget provided best fits to measured production, and thus represent the best alternatives to the cohort-based and size-based methods in this habitat. The consistent performance of one of these Cusson and Bourget models, which was developed for the macrobenthos of sandy substrate habitats (C&B-SS), shows that the performance of a model does not depend on whether it was developed for a specific taxonomic group. Moreover, since some widely used models (e.g., the Robertson model) show very different responses when applied to the macrofauna of different marine environments (e.g., sandy beaches and estuaries), prior evaluation of these models is essential.
Resumo:
Background: The temporal and geographical diversification of Neotropical insects remains poorly understood because of the complex changes in geological and climatic conditions that occurred during the Cenozoic. To better understand extant patterns in Neotropical biodiversity, we investigated the evolutionary history of three Neotropical swallowtail Troidini genera (Papilionidae). First, DNA-based species delimitation analyses were conducted to assess species boundaries within Neotropical Troidini using an enlarged fragment of the standard barcode gene. Molecularly delineated species were then used to infer a time-calibrated species-level phylogeny based on a three-gene dataset and Bayesian dating analyses. The corresponding chronogram was used to explore their temporal and geographical diversification through distinct likelihood-based methods. Results: The phylogeny for Neotropical Troidini was well resolved and strongly supported. Molecular dating and biogeographic analyses indicate that the extant lineages of Neotropical Troidini have a late Eocene (33-42 Ma) origin in North America. Two independent lineages (Battus and Euryades + Parides) reached South America via the GAARlandia temporary connection, and later became extinct in North America. They only began substantive diversification during the early Miocene in Amazonia. Macroevolutionary analysis supports the "museum model" of diversification, rather than Pleistocene refugia, as the best explanation for the diversification of these lineages. Conclusions: This study demonstrates that: (i) current Neotropical biodiversity may have originated ex situ; (ii) the GAARlandia bridge was important in facilitating invasions of South America; (iii) colonization of Amazonia initiated the crown diversification of these swallowtails; and (iv) Amazonia is not only a species-rich region but also acted as a sanctuary for the dynamics of this diversity. In particular, Amazonia probably allowed the persistence of old lineages and contributed to the steady accumulation of diversity over time with constant net diversification rates, a result that contrasts with previous studies on other South American butterflies.
Resumo:
Industrial recurrent event data where an event of interest can be observed more than once in a single sample unit are presented in several areas, such as engineering, manufacturing and industrial reliability. Such type of data provide information about the number of events, time to their occurrence and also their costs. Nelson (1995) presents a methodology to obtain asymptotic confidence intervals for the cost and the number of cumulative recurrent events. Although this is a standard procedure, it can not perform well in some situations, in particular when the sample size available is small. In this context, computer-intensive methods such as bootstrap can be used to construct confidence intervals. In this paper, we propose a technique based on the bootstrap method to have interval estimates for the cost and the number of cumulative events. One of the advantages of the proposed methodology is the possibility for its application in several areas and its easy computational implementation. In addition, it can be a better alternative than asymptotic-based methods to calculate confidence intervals, according to some Monte Carlo simulations. An example from the engineering area illustrates the methodology.
Resumo:
Haemophilus parasuis infection, known as Glässer’s disease, is characterized by fibrinous polyserositis, arthritis and meningitis in piglets. Although traditional diagnosis is based on herd history, clinical signs, bacterial isolation and serotyping, the molecular-based methods are alternatives for species-specific tests and epidemiologic study. The aim of this study was to characterize H. parasuis strains isolated from different states of Brazil by serotyping, PCR and ERIC-PCR. Serotyping revealed serovar 4 as the most prevalent (24 %), followed by serovars 14 (14 %), 5 (12 %), 13 (8 %) and 2 (2 %), whereas 40 % of the strains were considered as non-typeable. From 50 strains tested 43 (86%) were positive to Group 1 vtaA gene that have been related to virulent strains of H.parasuis. ERIC-PCR was able to type isolates tested among 23 different patterns, including non-typeable strains. ERIC-PCR patterns were very heterogeneous and presented high similarity between strains of the same animal or farm origin. The results indicated ERIC-PCR as a valuable tool for typing H. parasuis isolates collected in Brazil.
Resumo:
Due to the growing interest in social networks, link prediction has received significant attention. Link prediction is mostly based on graph-based features, with some recent approaches focusing on domain semantics. We propose algorithms for link prediction that use a probabilistic ontology to enhance the analysis of the domain and the unavoidable uncertainty in the task (the ontology is specified in the probabilistic description logic crALC). The scalability of the approach is investigated, through a combination of semantic assumptions and graph-based features. We evaluate empirically our proposal, and compare it with standard solutions in the literature.
Resumo:
The ideal approach for the long term treatment of intestinal disorders, such as inflammatory bowel disease (IBD), is represented by a safe and well tolerated therapy able to reduce mucosal inflammation and maintain homeostasis of the intestinal microbiota. A combined therapy with antimicrobial agents, to reduce antigenic load, and immunomodulators, to ameliorate the dysregulated responses, followed by probiotic supplementation has been proposed. Because of the complementary mechanisms of action of antibiotics and probiotics, a combined therapeutic approach would give advantages in terms of enlargement of the antimicrobial spectrum, due to the barrier effect of probiotic bacteria, and limitation of some side effects of traditional chemiotherapy (i.e. indiscriminate decrease of aggressive and protective intestinal bacteria, altered absorption of nutrient elements, allergic and inflammatory reactions). Rifaximin (4-deoxy-4’-methylpyrido[1’,2’-1,2]imidazo[5,4-c]rifamycin SV) is a product of synthesis experiments designed to modify the parent compound, rifamycin, in order to achieve low gastrointestinal absorption while retaining good antibacterial activity. Both experimental and clinical pharmacology clearly show that this compound is a non systemic antibiotic with a broad spectrum of antibacterial action, covering Gram-positive and Gram-negative organisms, both aerobes and anaerobes. Being virtually non absorbed, its bioavailability within the gastrointestinal tract is rather high with intraluminal and faecal drug concentrations that largely exceed the MIC values observed in vitro against a wide range of pathogenic microorganisms. The gastrointestinal tract represents therefore the primary therapeutic target and gastrointestinal infections the main indication. The little value of rifaximin outside the enteric area minimizes both antimicrobial resistance and systemic adverse events. Fermented dairy products enriched with probiotic bacteria have developed into one of the most successful categories of functional foods. Probiotics are defined as “live microorganisms which, when administered in adequate amounts, confer a health benefit on the host” (FAO/WHO, 2002), and mainly include Lactobacillus and Bifidobacterium species. Probiotic bacteria exert a direct effect on the intestinal microbiota of the host and contribute to organoleptic, rheological and nutritional properties of food. Administration of pharmaceutical probiotic formula has been associated with therapeutic effects in treatment of diarrhoea, constipation, flatulence, enteropathogens colonization, gastroenteritis, hypercholesterolemia, IBD, such as ulcerative colitis (UC), Crohn’s disease, pouchitis and irritable bowel syndrome. Prerequisites for probiotics are to be effective and safe. The characteristics of an effective probiotic for gastrointestinal tract disorders are tolerance to upper gastrointestinal environment (resistance to digestion by enteric or pancreatic enzymes, gastric acid and bile), adhesion on intestinal surface to lengthen the retention time, ability to prevent the adherence, establishment and/or replication of pathogens, production of antimicrobial substances, degradation of toxic catabolites by bacterial detoxifying enzymatic activities, and modulation of the host immune responses. This study was carried out using a validated three-stage fermentative continuous system and it is aimed to investigate the effect of rifaximin on the colonic microbial flora of a healthy individual, in terms of bacterial composition and production of fermentative metabolic end products. Moreover, this is the first study that investigates in vitro the impact of the simultaneous administration of the antibiotic rifaximin and the probiotic B. lactis BI07 on the intestinal microbiota. Bacterial groups of interest were evaluated using culture-based methods and molecular culture-independent techniques (FISH, PCR-DGGE). Metabolic outputs in terms of SCFA profiles were determined by HPLC analysis. Collected data demonstrated that rifaximin as well as antibiotic and probiotic treatment did not change drastically the intestinal microflora, whereas bacteria belonging to Bifidobacterium and Lactobacillus significantly increase over the course of the treatment, suggesting a spontaneous upsurge of rifaximin resistance. These results are in agreement with a previous study, in which it has been demonstrated that rifaximin administration in patients with UC, affects the host with minor variations of the intestinal microflora, and that the microbiota is restored over a wash-out period. In particular, several Bifidobacterium rifaximin resistant mutants could be isolated during the antibiotic treatment, but they disappeared after the antibiotic suspension. Furthermore, bacteria belonging to Atopobium spp. and E. rectale/Clostridium cluster XIVa increased significantly after rifaximin and probiotic treatment. Atopobium genus and E. rectale/Clostridium cluster XIVa are saccharolytic, butyrate-producing bacteria, and for these characteristics they are widely considered health-promoting microorganisms. The absence of major variations in the intestinal microflora of a healthy individual and the significant increase in probiotic and health-promoting bacteria concentrations support the rationale of the administration of rifaximin as efficacious and non-dysbiosis promoting therapy and suggest the efficacy of an antibiotic/probiotic combined treatment in several gut pathologies, such as IBD. To assess the use of an antibiotic/probiotic combination for clinical management of intestinal disorders, genetic, proteomic and physiologic approaches were employed to elucidate molecular mechanisms determining rifaximin resistance in Bifidobacterium, and the expected interactions occurring in the gut between these bacteria and the drug. The ability of an antimicrobial agent to select resistance is a relevant factor that affects its usefulness and may diminish its useful life. Rifaximin resistance phenotype was easily acquired by all bifidobacteria analyzed [type strains of the most representative intestinal bifidobacterial species (B. infantis, B. breve, B. longum, B. adolescentis and B. bifidum) and three bifidobacteria included in a pharmaceutical probiotic preparation (B. lactis BI07, B. breve BBSF and B. longum BL04)] and persisted for more than 400 bacterial generations in the absence of selective pressure. Exclusion of any reversion phenomenon suggested two hypotheses: (i) stable and immobile genetic elements encode resistance; (ii) the drug moiety does not act as an inducer of the resistance phenotype, but enables selection of resistant mutants. Since point mutations in rpoB have been indicated as representing the principal factor determining rifampicin resistance in E. coli and M. tuberculosis, whether a similar mechanism also occurs in Bifidobacterium was verified. The analysis of a 129 bp rpoB core region of several wild-type and resistant bifidobacteria revealed five different types of miss-sense mutations in codons 513, 516, 522 and 529. Position 529 was a novel mutation site, not previously described, and position 522 appeared interesting for both the double point substitutions and the heterogeneous profile of nucleotide changes. The sequence heterogeneity of codon 522 in Bifidobacterium leads to hypothesize an indirect role of its encoded amino acid in the binding with the rifaximin moiety. These results demonstrated the chromosomal nature of rifaximin resistance in Bifidobacterium, minimizing risk factors for horizontal transmission of resistance elements between intestinal microbial species. Further proteomic and physiologic investigations were carried out using B. lactis BI07, component of a pharmaceutical probiotic preparation, as a model strain. The choice of this strain was determined based on the following elements: (i) B. lactis BI07 is able to survive and persist in the gut; (ii) a proteomic overview of this strain has been recently reported. The involvement of metabolic changes associated with rifaximin resistance was investigated by proteomic analysis performed with two-dimensional electrophoresis and mass spectrometry. Comparative proteomic mapping of BI07-wt and BI07-res revealed that most differences in protein expression patterns were genetically encoded rather than induced by antibiotic exposure. In particular, rifaximin resistance phenotype was characterized by increased expression levels of stress proteins. Overexpression of stress proteins was expected, as they represent a common non specific response by bacteria when stimulated by different shock conditions, including exposure to toxic agents like heavy metals, oxidants, acids, bile salts and antibiotics. Also, positive transcription regulators were found to be overexpressed in BI07-res, suggesting that bacteria could activate compensatory mechanisms to assist the transcription process in the presence of RNA polymerase inhibitors. Other differences in expression profiles were related to proteins involved in central metabolism; these modifications suggest metabolic disadvantages of resistant mutants in comparison with sensitive bifidobacteria in the gut environment, without selective pressure, explaining their disappearance from faeces of patients with UC after interruption of antibiotic treatment. The differences observed between BI07-wt e BI07-res proteomic patterns, as well as the high frequency of silent mutations reported for resistant mutants of Bifidobacterium could be the consequences of an increased mutation rate, mechanism which may lead to persistence of resistant bacteria in the population. However, the in vivo disappearance of resistant mutants in absence of selective pressure, allows excluding the upsurge of compensatory mutations without loss of resistance. Furthermore, the proteomic characterization of the resistant phenotype suggests that rifaximin resistance is associated with a reduced bacterial fitness in B. lactis BI07-res, supporting the hypothesis of a biological cost of antibiotic resistance in Bifidobacterium. The hypothesis of rifaximin inactivation by bacterial enzymatic activities was verified by using liquid chromatography coupled with tandem mass spectrometry. Neither chemical modifications nor degradation derivatives of the rifaximin moiety were detected. The exclusion of a biodegradation pattern for the drug was further supported by the quantitative recovery in BI07-res culture fractions of the total rifaximin amount (100 μg/ml) added to the culture medium. To confirm the main role of the mutation on the β chain of RNA polymerase in rifaximin resistance acquisition, transcription activity of crude enzymatic extracts of BI07-res cells was evaluated. Although the inhibition effects of rifaximin on in vitro transcription were definitely higher for BI07-wt than for BI07-res, a partial resistance of the mutated RNA polymerase at rifaximin concentrations > 10 μg/ml was supposed, on the basis of the calculated differences in inhibition percentages between BI07-wt and BI07-res. By considering the resistance of entire BI07-res cells to rifaximin concentrations > 100 μg/ml, supplementary resistance mechanisms may take place in vivo. A barrier for the rifaximin uptake in BI07-res cells was suggested in this study, on the basis of the major portion of the antibiotic found to be bound to the cellular pellet respect to the portion recovered in the cellular lysate. Related to this finding, a resistance mechanism involving changes of membrane permeability was supposed. A previous study supports this hypothesis, demonstrating the involvement of surface properties and permeability in natural resistance to rifampicin in mycobacteria, isolated from cases of human infection, which possessed a rifampicin-susceptible RNA polymerase. To understand the mechanism of membrane barrier, variations in percentage of saturated and unsaturated FAs and their methylation products in BI07-wt and BI07-res membranes were investigated. While saturated FAs confer rigidity to membrane and resistance to stress agents, such as antibiotics, a high level of lipid unsaturation is associated with high fluidity and susceptibility to stresses. Thus, the higher percentage of saturated FAs during the stationary phase of BI07-res could represent a defence mechanism of mutant cells to prevent the antibiotic uptake. Furthermore, the increase of CFAs such as dihydrosterculic acid during the stationary phase of BI07-res suggests that this CFA could be more suitable than its isomer lactobacillic acid to interact with and prevent the penetration of exogenous molecules including rifaximin. Finally, the impact of rifaximin on immune regulatory functions of the gut was evaluated. It has been suggested a potential anti-inflammatory effect of rifaximin, with reduced secretion of IFN-γ in a rodent model of colitis. Analogously, it has been reported a significant decrease in IL-8, MCP-1, MCP-3 e IL-10 levels in patients affected by pouchitis, treated with a combined therapy of rifaximin and ciprofloxacin. Since rifaximin enables in vivo and in vitro selection of Bifidobacterium resistant mutants with high frequency, the immunomodulation activities of rifaximin associated with a B. lactis resistant mutant were also taken into account. Data obtained from PBMC stimulation experiments suggest the following conclusions: (i) rifaximin does not exert any effect on production of IL-1β, IL-6 and IL-10, whereas it weakly stimulates production of TNF-α; (ii) B. lactis appears as a good inducer of IL-1β, IL-6 and TNF-α; (iii) combination of BI07-res and rifaximin exhibits a lower stimulation effect than BI07-res alone, especially for IL-6. These results confirm the potential anti-inflammatory effect of rifaximin, and are in agreement with several studies that report a transient pro-inflammatory response associated with probiotic administration. The understanding of the molecular factors determining rifaximin resistance in the genus Bifidobacterium assumes an applicative significance at pharmaceutical and medical level, as it represents the scientific basis to justify the simultaneous use of the antibiotic rifaximin and probiotic bifidobacteria in the clinical treatment of intestinal disorders.
Resumo:
In this work we aim to propose a new approach for preliminary epidemiological studies on Standardized Mortality Ratios (SMR) collected in many spatial regions. A preliminary study on SMRs aims to formulate hypotheses to be investigated via individual epidemiological studies that avoid bias carried on by aggregated analyses. Starting from collecting disease counts and calculating expected disease counts by means of reference population disease rates, in each area an SMR is derived as the MLE under the Poisson assumption on each observation. Such estimators have high standard errors in small areas, i.e. where the expected count is low either because of the low population underlying the area or the rarity of the disease under study. Disease mapping models and other techniques for screening disease rates among the map aiming to detect anomalies and possible high-risk areas have been proposed in literature according to the classic and the Bayesian paradigm. Our proposal is approaching this issue by a decision-oriented method, which focus on multiple testing control, without however leaving the preliminary study perspective that an analysis on SMR indicators is asked to. We implement the control of the FDR, a quantity largely used to address multiple comparisons problems in the eld of microarray data analysis but which is not usually employed in disease mapping. Controlling the FDR means providing an estimate of the FDR for a set of rejected null hypotheses. The small areas issue arises diculties in applying traditional methods for FDR estimation, that are usually based only on the p-values knowledge (Benjamini and Hochberg, 1995; Storey, 2003). Tests evaluated by a traditional p-value provide weak power in small areas, where the expected number of disease cases is small. Moreover tests cannot be assumed as independent when spatial correlation between SMRs is expected, neither they are identical distributed when population underlying the map is heterogeneous. The Bayesian paradigm oers a way to overcome the inappropriateness of p-values based methods. Another peculiarity of the present work is to propose a hierarchical full Bayesian model for FDR estimation in testing many null hypothesis of absence of risk.We will use concepts of Bayesian models for disease mapping, referring in particular to the Besag York and Mollié model (1991) often used in practice for its exible prior assumption on the risks distribution across regions. The borrowing of strength between prior and likelihood typical of a hierarchical Bayesian model takes the advantage of evaluating a singular test (i.e. a test in a singular area) by means of all observations in the map under study, rather than just by means of the singular observation. This allows to improve the power test in small areas and addressing more appropriately the spatial correlation issue that suggests that relative risks are closer in spatially contiguous regions. The proposed model aims to estimate the FDR by means of the MCMC estimated posterior probabilities b i's of the null hypothesis (absence of risk) for each area. An estimate of the expected FDR conditional on data (\FDR) can be calculated in any set of b i's relative to areas declared at high-risk (where thenull hypothesis is rejected) by averaging the b i's themselves. The\FDR can be used to provide an easy decision rule for selecting high-risk areas, i.e. selecting as many as possible areas such that the\FDR is non-lower than a prexed value; we call them\FDR based decision (or selection) rules. The sensitivity and specicity of such rule depend on the accuracy of the FDR estimate, the over-estimation of FDR causing a loss of power and the under-estimation of FDR producing a loss of specicity. Moreover, our model has the interesting feature of still being able to provide an estimate of relative risk values as in the Besag York and Mollié model (1991). A simulation study to evaluate the model performance in FDR estimation accuracy, sensitivity and specificity of the decision rule, and goodness of estimation of relative risks, was set up. We chose a real map from which we generated several spatial scenarios whose counts of disease vary according to the spatial correlation degree, the size areas, the number of areas where the null hypothesis is true and the risk level in the latter areas. In summarizing simulation results we will always consider the FDR estimation in sets constituted by all b i's selected lower than a threshold t. We will show graphs of the\FDR and the true FDR (known by simulation) plotted against a threshold t to assess the FDR estimation. Varying the threshold we can learn which FDR values can be accurately estimated by the practitioner willing to apply the model (by the closeness between\FDR and true FDR). By plotting the calculated sensitivity and specicity (both known by simulation) vs the\FDR we can check the sensitivity and specicity of the corresponding\FDR based decision rules. For investigating the over-smoothing level of relative risk estimates we will compare box-plots of such estimates in high-risk areas (known by simulation), obtained by both our model and the classic Besag York Mollié model. All the summary tools are worked out for all simulated scenarios (in total 54 scenarios). Results show that FDR is well estimated (in the worst case we get an overestimation, hence a conservative FDR control) in small areas, low risk levels and spatially correlated risks scenarios, that are our primary aims. In such scenarios we have good estimates of the FDR for all values less or equal than 0.10. The sensitivity of\FDR based decision rules is generally low but specicity is high. In such scenario the use of\FDR = 0:05 or\FDR = 0:10 based selection rule can be suggested. In cases where the number of true alternative hypotheses (number of true high-risk areas) is small, also FDR = 0:15 values are well estimated, and \FDR = 0:15 based decision rules gains power maintaining an high specicity. On the other hand, in non-small areas and non-small risk level scenarios the FDR is under-estimated unless for very small values of it (much lower than 0.05); this resulting in a loss of specicity of a\FDR = 0:05 based decision rule. In such scenario\FDR = 0:05 or, even worse,\FDR = 0:1 based decision rules cannot be suggested because the true FDR is actually much higher. As regards the relative risk estimation, our model achieves almost the same results of the classic Besag York Molliè model. For this reason, our model is interesting for its ability to perform both the estimation of relative risk values and the FDR control, except for non-small areas and large risk level scenarios. A case of study is nally presented to show how the method can be used in epidemiology.
Resumo:
In this work seismic upgrading of existing masonry structures by means of hysteretic ADAS dampers is treated. ADAS are installed on external concrete walls, which are built parallel to the building, and then linked to the building's slab by means of steel rod connection system. In order to assess the effectiveness of the intervention, a parametric study considering variation of damper main features has been conducted. To this aim, the concepts of equivalent linear system (ELS) or equivalent viscous damping are deepen. Simplified equivalent linear model results are then checked respect results of the yielding structures. Two alternative displacement based methods for damper design are herein proposed. Both methods have been validated through non linear time history analyses with spectrum compatible accelerograms. Finally ADAS arrangement for the non conventional implementation is proposed.
Resumo:
Das Ziel der Arbeit war die Entwicklung computergestützter Methoden zur Erstellung einer Gefahrenhinweiskarte für die Region Rheinhessen, zur Minimierung der Hangrutschungsgefährdung. Dazu wurde mit Hilfe zweier statistischer Verfahren (Diskriminanzanalyse, Logistische Regression) und einer Methode aus dem Bereich der Künstlichen Intelligenz (Fuzzy Logik) versucht, die potentielle Gefährdung auch solcher Hänge zu klassifizieren, die bis heute noch nicht durch Massenbewegungen aufgefallen sind. Da ingenieurgeologische und geotechnische Hanguntersuchungen aus Zeit und Kostengründen im regionalen Maßstab nicht möglich sind, wurde auf punktuell vorhandene Datenbestände zu einzelnen Rutschungen des Winters 1981/82, die in einer Rutschungsdatenbank zusammengefaßt sind, zurückgegriffen, wobei die daraus gewonnenen Erkenntnisse über Prozeßmechanismen und auslösende Faktoren genutzt und in das jeweilige Modell integriert wurden. Flächenhafte Daten (Lithologie, Hangneigung, Landnutzung, etc.), die für die Berechnung der Hangstabilität notwendig sind, wurden durch Fernerkundungsmethoden, dem Digitalisieren von Karten und der Auswertung von Digitalen Geländemodellen (Reliefanalyse) gewonnen. Für eine weiterführende Untersuchung von einzelnen, als rutschgefährdet klassifizierten Bereichen der Gefahrenhinweiskarte, wurde am Beispiel eines Testgebietes, eine auf dem infinite-slope-stability Modell aufbauende Methode untersucht, die im Maßstabsbereich von Grundkarten (1:5000) auch geotechnische und hydrogeologische Parameter berücksichtigt und damit eine genauere, der jeweiligen klimatischen Situation angepaßte, Gefahrenabschätzung ermöglicht.
Resumo:
In the last decade, the reverse vaccinology approach shifted the paradigm of vaccine discovery from conventional culture-based methods to high-throughput genome-based approaches for the development of recombinant protein-based vaccines against pathogenic bacteria. Besides reaching its main goal of identifying new vaccine candidates, this new procedure produced also a huge amount of molecular knowledge related to them. In the present work, we explored this knowledge in a species-independent way and we performed a systematic in silico molecular analysis of more than 100 protective antigens, looking at their sequence similarity, domain composition and protein architecture in order to identify possible common molecular features. This meta-analysis revealed that, beside a low sequence similarity, most of the known bacterial protective antigens shared structural/functional Pfam domains as well as specific protein architectures. Based on this, we formulated the hypothesis that the occurrence of these molecular signatures can be predictive of possible protective properties of other proteins in different bacterial species. We tested this hypothesis in Streptococcus agalactiae and identified four new protective antigens. Moreover, in order to provide a second proof of the concept for our approach, we used Staphyloccus aureus as a second pathogen and identified five new protective antigens. This new knowledge-driven selection process, named MetaVaccinology, represents the first in silico vaccine discovery tool based on conserved and predictive molecular and structural features of bacterial protective antigens and not dependent upon the prediction of their sub-cellular localization.
Resumo:
There are different ways to do cluster analysis of categorical data in the literature and the choice among them is strongly related to the aim of the researcher, if we do not take into account time and economical constraints. Main approaches for clustering are usually distinguished into model-based and distance-based methods: the former assume that objects belonging to the same class are similar in the sense that their observed values come from the same probability distribution, whose parameters are unknown and need to be estimated; the latter evaluate distances among objects by a defined dissimilarity measure and, basing on it, allocate units to the closest group. In clustering, one may be interested in the classification of similar objects into groups, and one may be interested in finding observations that come from the same true homogeneous distribution. But do both of these aims lead to the same clustering? And how good are clustering methods designed to fulfil one of these aims in terms of the other? In order to answer, two approaches, namely a latent class model (mixture of multinomial distributions) and a partition around medoids one, are evaluated and compared by Adjusted Rand Index, Average Silhouette Width and Pearson-Gamma indexes in a fairly wide simulation study. Simulation outcomes are plotted in bi-dimensional graphs via Multidimensional Scaling; size of points is proportional to the number of points that overlap and different colours are used according to the cluster membership.
Resumo:
Die chronisch obstruktive Lungenerkrankung (engl. chronic obstructive pulmonary disease, COPD) ist ein Überbegriff für Erkrankungen, die zu Husten, Auswurf und Dyspnoe (Atemnot) in Ruhe oder Belastung führen - zu diesen werden die chronische Bronchitis und das Lungenemphysem gezählt. Das Fortschreiten der COPD ist eng verknüpft mit der Zunahme des Volumens der Wände kleiner Luftwege (Bronchien). Die hochauflösende Computertomographie (CT) gilt bei der Untersuchung der Morphologie der Lunge als Goldstandard (beste und zuverlässigste Methode in der Diagnostik). Möchte man Bronchien, eine in Annäherung tubuläre Struktur, in CT-Bildern vermessen, so stellt die geringe Größe der Bronchien im Vergleich zum Auflösungsvermögen eines klinischen Computertomographen ein großes Problem dar. In dieser Arbeit wird gezeigt wie aus konventionellen Röntgenaufnahmen CT-Bilder berechnet werden, wo die mathematischen und physikalischen Fehlerquellen im Bildentstehungsprozess liegen und wie man ein CT-System mittels Interpretation als lineares verschiebungsinvariantes System (engl. linear shift invariant systems, LSI System) mathematisch greifbar macht. Basierend auf der linearen Systemtheorie werden Möglichkeiten zur Beschreibung des Auflösungsvermögens bildgebender Verfahren hergeleitet. Es wird gezeigt wie man den Tracheobronchialbaum aus einem CT-Datensatz stabil segmentiert und mittels eines topologieerhaltenden 3-dimensionalen Skelettierungsalgorithmus in eine Skelettdarstellung und anschließend in einen kreisfreien Graphen überführt. Basierend auf der linearen System Theorie wird eine neue, vielversprechende, integral-basierte Methodik (IBM) zum Vermessen kleiner Strukturen in CT-Bildern vorgestellt. Zum Validieren der IBM-Resultate wurden verschiedene Messungen an einem Phantom, bestehend aus 10 unterschiedlichen Silikon Schläuchen, durchgeführt. Mit Hilfe der Skelett- und Graphendarstellung ist ein Vermessen des kompletten segmentierten Tracheobronchialbaums im 3-dimensionalen Raum möglich. Für 8 zweifach gescannte Schweine konnte eine gute Reproduzierbarkeit der IBM-Resultate nachgewiesen werden. In einer weiteren, mit IBM durchgeführten Studie konnte gezeigt werden, dass die durchschnittliche prozentuale Bronchialwandstärke in CT-Datensätzen von 16 Rauchern signifikant höher ist, als in Datensätzen von 15 Nichtrauchern. IBM läßt sich möglicherweise auch für Wanddickenbestimmungen bei Problemstellungen aus anderen Arbeitsgebieten benutzen - kann zumindest als Ideengeber dienen. Ein Artikel mit der Beschreibung der entwickelten Methodik und der damit erzielten Studienergebnisse wurde zur Publikation im Journal IEEE Transactions on Medical Imaging angenommen.
Resumo:
Die Verifikation numerischer Modelle ist für die Verbesserung der Quantitativen Niederschlagsvorhersage (QNV) unverzichtbar. Ziel der vorliegenden Arbeit ist die Entwicklung von neuen Methoden zur Verifikation der Niederschlagsvorhersagen aus dem regionalen Modell der MeteoSchweiz (COSMO-aLMo) und des Globalmodells des Europäischen Zentrums für Mittelfristvorhersage (engl.: ECMWF). Zu diesem Zweck wurde ein neuartiger Beobachtungsdatensatz für Deutschland mit stündlicher Auflösung erzeugt und angewandt. Für die Bewertung der Modellvorhersagen wurde das neue Qualitätsmaß „SAL“ entwickelt. Der neuartige, zeitlich und räumlich hoch-aufgelöste Beobachtungsdatensatz für Deutschland wird mit der während MAP (engl.: Mesoscale Alpine Program) entwickelten Disaggregierungsmethode erstellt. Die Idee dabei ist, die zeitlich hohe Auflösung der Radardaten (stündlich) mit der Genauigkeit der Niederschlagsmenge aus Stationsmessungen (im Rahmen der Messfehler) zu kombinieren. Dieser disaggregierte Datensatz bietet neue Möglichkeiten für die quantitative Verifikation der Niederschlagsvorhersage. Erstmalig wurde eine flächendeckende Analyse des Tagesgangs des Niederschlags durchgeführt. Dabei zeigte sich, dass im Winter kein Tagesgang existiert und dies vom COSMO-aLMo gut wiedergegeben wird. Im Sommer dagegen findet sich sowohl im disaggregierten Datensatz als auch im COSMO-aLMo ein deutlicher Tagesgang, wobei der maximale Niederschlag im COSMO-aLMo zu früh zwischen 11-14 UTC im Vergleich zu 15-20 UTC in den Beobachtungen einsetzt und deutlich um das 1.5-fache überschätzt wird. Ein neues Qualitätsmaß wurde entwickelt, da herkömmliche, gitterpunkt-basierte Fehlermaße nicht mehr der Modellentwicklung Rechnung tragen. SAL besteht aus drei unabhängigen Komponenten und basiert auf der Identifikation von Niederschlagsobjekten (schwellwertabhängig) innerhalb eines Gebietes (z.B. eines Flusseinzugsgebietes). Berechnet werden Unterschiede der Niederschlagsfelder zwischen Modell und Beobachtungen hinsichtlich Struktur (S), Amplitude (A) und Ort (L) im Gebiet. SAL wurde anhand idealisierter und realer Beispiele ausführlich getestet. SAL erkennt und bestätigt bekannte Modelldefizite wie das Tagesgang-Problem oder die Simulation zu vieler relativ schwacher Niederschlagsereignisse. Es bietet zusätzlichen Einblick in die Charakteristiken der Fehler, z.B. ob es sich mehr um Fehler in der Amplitude, der Verschiebung eines Niederschlagsfeldes oder der Struktur (z.B. stratiform oder kleinskalig konvektiv) handelt. Mit SAL wurden Tages- und Stundensummen des COSMO-aLMo und des ECMWF-Modells verifiziert. SAL zeigt im statistischen Sinne speziell für stärkere (und damit für die Gesellschaft relevante Niederschlagsereignisse) eine im Vergleich zu schwachen Niederschlägen gute Qualität der Vorhersagen des COSMO-aLMo. Im Vergleich der beiden Modelle konnte gezeigt werden, dass im Globalmodell flächigere Niederschläge und damit größere Objekte vorhergesagt werden. Das COSMO-aLMo zeigt deutlich realistischere Niederschlagsstrukturen. Diese Tatsache ist aufgrund der Auflösung der Modelle nicht überraschend, konnte allerdings nicht mit herkömmlichen Fehlermaßen gezeigt werden. Die im Rahmen dieser Arbeit entwickelten Methoden sind sehr nützlich für die Verifikation der QNV zeitlich und räumlich hoch-aufgelöster Modelle. Die Verwendung des disaggregierten Datensatzes aus Beobachtungen sowie SAL als Qualitätsmaß liefern neue Einblicke in die QNV und lassen angemessenere Aussagen über die Qualität von Niederschlagsvorhersagen zu. Zukünftige Anwendungsmöglichkeiten für SAL gibt es hinsichtlich der Verifikation der neuen Generation von numerischen Wettervorhersagemodellen, die den Lebenszyklus hochreichender konvektiver Zellen explizit simulieren.
Resumo:
Die Arbeit behandelt das Problem der Skalierbarkeit von Reinforcement Lernen auf hochdimensionale und komplexe Aufgabenstellungen. Unter Reinforcement Lernen versteht man dabei eine auf approximativem Dynamischen Programmieren basierende Klasse von Lernverfahren, die speziell Anwendung in der Künstlichen Intelligenz findet und zur autonomen Steuerung simulierter Agenten oder realer Hardwareroboter in dynamischen und unwägbaren Umwelten genutzt werden kann. Dazu wird mittels Regression aus Stichproben eine Funktion bestimmt, die die Lösung einer "Optimalitätsgleichung" (Bellman) ist und aus der sich näherungsweise optimale Entscheidungen ableiten lassen. Eine große Hürde stellt dabei die Dimensionalität des Zustandsraums dar, die häufig hoch und daher traditionellen gitterbasierten Approximationsverfahren wenig zugänglich ist. Das Ziel dieser Arbeit ist es, Reinforcement Lernen durch nichtparametrisierte Funktionsapproximation (genauer, Regularisierungsnetze) auf -- im Prinzip beliebig -- hochdimensionale Probleme anwendbar zu machen. Regularisierungsnetze sind eine Verallgemeinerung von gewöhnlichen Basisfunktionsnetzen, die die gesuchte Lösung durch die Daten parametrisieren, wodurch die explizite Wahl von Knoten/Basisfunktionen entfällt und so bei hochdimensionalen Eingaben der "Fluch der Dimension" umgangen werden kann. Gleichzeitig sind Regularisierungsnetze aber auch lineare Approximatoren, die technisch einfach handhabbar sind und für die die bestehenden Konvergenzaussagen von Reinforcement Lernen Gültigkeit behalten (anders als etwa bei Feed-Forward Neuronalen Netzen). Allen diesen theoretischen Vorteilen gegenüber steht allerdings ein sehr praktisches Problem: der Rechenaufwand bei der Verwendung von Regularisierungsnetzen skaliert von Natur aus wie O(n**3), wobei n die Anzahl der Daten ist. Das ist besonders deswegen problematisch, weil bei Reinforcement Lernen der Lernprozeß online erfolgt -- die Stichproben werden von einem Agenten/Roboter erzeugt, während er mit der Umwelt interagiert. Anpassungen an der Lösung müssen daher sofort und mit wenig Rechenaufwand vorgenommen werden. Der Beitrag dieser Arbeit gliedert sich daher in zwei Teile: Im ersten Teil der Arbeit formulieren wir für Regularisierungsnetze einen effizienten Lernalgorithmus zum Lösen allgemeiner Regressionsaufgaben, der speziell auf die Anforderungen von Online-Lernen zugeschnitten ist. Unser Ansatz basiert auf der Vorgehensweise von Recursive Least-Squares, kann aber mit konstantem Zeitaufwand nicht nur neue Daten sondern auch neue Basisfunktionen in das bestehende Modell einfügen. Ermöglicht wird das durch die "Subset of Regressors" Approximation, wodurch der Kern durch eine stark reduzierte Auswahl von Trainingsdaten approximiert wird, und einer gierigen Auswahlwahlprozedur, die diese Basiselemente direkt aus dem Datenstrom zur Laufzeit selektiert. Im zweiten Teil übertragen wir diesen Algorithmus auf approximative Politik-Evaluation mittels Least-Squares basiertem Temporal-Difference Lernen, und integrieren diesen Baustein in ein Gesamtsystem zum autonomen Lernen von optimalem Verhalten. Insgesamt entwickeln wir ein in hohem Maße dateneffizientes Verfahren, das insbesondere für Lernprobleme aus der Robotik mit kontinuierlichen und hochdimensionalen Zustandsräumen sowie stochastischen Zustandsübergängen geeignet ist. Dabei sind wir nicht auf ein Modell der Umwelt angewiesen, arbeiten weitestgehend unabhängig von der Dimension des Zustandsraums, erzielen Konvergenz bereits mit relativ wenigen Agent-Umwelt Interaktionen, und können dank des effizienten Online-Algorithmus auch im Kontext zeitkritischer Echtzeitanwendungen operieren. Wir demonstrieren die Leistungsfähigkeit unseres Ansatzes anhand von zwei realistischen und komplexen Anwendungsbeispielen: dem Problem RoboCup-Keepaway, sowie der Steuerung eines (simulierten) Oktopus-Tentakels.