962 resultados para Clustering a large document collection


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Esta dissertação tem por objetivo discutir parcela da trajetória do escritor Haroldo Maranhão (1927-2004), revelada à luz dos documentos pertencentes ao seu arquivo pessoal. O estudo se organiza tendo em vista três perspectivas: o Haroldo Maranhão leitor, possuidor de um acervo bibliográfico acumulado ao longo de anos, o Haroldo Maranhão jornalista, nascido e formado profissionalmente no seio de um clã que por meio século esteve à frente de um dos jornais mais influentes da capital paraense, a Folha do Norte, e o Haroldo Maranhão escritor, em seus freqüentes embates com as práticas que regem a lógica do mundo editorial.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Classification is a fundamental activity for the organization and management of archival document collection, as it is from this function that are based the priorities for the description and the bases for appraisal procedures. This article seeks through the fundamental works to the theoretical and methodological development of the Archival Science characterizes the historical and conceptual background from the classification notion. This article aims to answering some questions about the expansion of its importance during the development of its theory and its use on today. Is also sought to characterize the history of the Archival Science as a discipline since the classification was one of the first activities to be theorized.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In der vorliegenden Dissertation wurden verschiedene Kandidatengene für den Wilmstumor (WT), eine Tumorerkrankung der Niere, identifiziert und charakterisiert. Da dieses frühkindliche Malignom aus einer inkorrekt ablaufenden Metanephrogenese resultiert, wurden die Genexpressionsmuster verschiedener humaner Wilmstumor- und Normalnierengewebe (adulte sowie fetale Niere) mit Hilfe der Technik des differential display verglichen und die als differenziell exprimiert identifizierten Gene kloniert und charakterisiert. Bei TM7SF1 handelt es sich um ein neues Gen, dessen Transkription im Zuge der Metanephrogenese angeschaltet wird. Das von ihm codierte putative Protein kann aufgrund von Strukturvorhersagen vermutlich zur Familie G Protein-gekoppelter Rezeptoren gezählt werden. Die ableitbare Funktion als Signalmolekül der Nierenentwicklung, sowie seine Lokalisation in einem WT-Lokus (1q42-q43) machen TM7SF1 zu einem aussichtsreichen Kandidatengen für den WT. Darüber hinaus konnten die Voraussetzungen für funktionelle Tests, die eine weitere Charakterisierung von TM7SF1 erlauben, geschaffen werden (Identifikation und Klonierung des murinen Homologen, stabil überexprimierende WT-Zelllinien, Antikörper gegen den Aminoterminus des putativen Proteins). Mit TCF2 wurde ein weiteres Gen identifiziert, dessen Produkt in Prozessen der Metanephrogenese eine Rolle spielt. Die signifikante Herunterregulation der TCF2-Expression in der großen Mehrzahl der untersuchten WTs, die innerhalb der vorliegenden Arbeit gezeigte Regulation durch das WT1-Genprodukt, sowie seine genomische Lokalisation in einem Intervall für die familiäre Form des WT (FWT1 in 17q12-q21) zeigen das Potenzial von TCF2, als Kandidatengen für den FWT zu gelten. Darüber hinaus wurde mit GLI3 ein in verschiedenen WTs stark exprimiertes Gen identifiziert. Sein Produkt ist eine Komponente des entwicklungsbiologisch relevanten und in verschiedene Tumorerkrankungen involvierten sonic hedgehog-Signaltransduktionsweges. Mit FE7A3 und CDT151 konnten zwei differenziell exprimierte cDNAs identifiziert werden, die Teile neuer Gene darstellen und die in WT-Loci kartiert werden konnten. Aufgrund von Homologievergleichen im Bereich der identifizierten offenen Leserahmen konnte eine mögliche Bedeutung der putativen Genprodukte für die WT-Pathogenese als Zelladhäsionsmolekül (FE7A3) bzw. als mit der Proliferation assoziiertem Transkriptionsfaktor (CDT151) herausgearbeitet werden. Neben den komparativen Genexpressionsuntersuchungen wurde in einem zweiten Ansatz die transkriptionelle Regulation des einzigen bisher klonierten Wilmstumorgens (WT1) analysiert. Mit Hilfe vergleichender Reportergenanalysen in WT1-exprimierenden und nicht-exprimierenden Zelllinien konnten neue für die transkriptionelle Regulation von WT1 relevante Bereiche identifiziert werden. Darüber hinaus wurde der für die Transkriptionsfaktoren SP1 und SP3 an anderen Promotoren beschriebene funktionelle Antagonismus für die WT1-Expression untersucht und in Gelretardationsanalysen mit dem WT1-Expressionsstatus oben genannter Zelllinien korreliert.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The first part of this paper will give a brief introduction to maritime missiology, the second section will trace the beginnings of the Boston Seaman’s Friend Society in the nineteenth century and the third will focus on the Vineyard Haven branch of that work well into the twentieth century. Using source material from the American Seamen’s Friend Society - there is a 5,000 document collection of the ASFS papers in the G.W. Blunt White Library at Mystic Seaport, the Boston Seaman’s Friend Society - whose papers are mostly in the Congregational House on Beacon Hill in Boston, and other secondary works from the nineteenth and twentieth century. I am especially indebted to George Wiseman’s book, They Kept the Lower Lights Burning, Wiseman was the pastor of Trinity Methodist Episcopal Church in Oak Bluff during WWII and the son-in-law of Austin Tower. This presentation will look at the many facets that made up religious work among seafarers.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, there is a significant quantity of linguistic data available on the Web. However, linguistic resources are often published using proprietary formats and, as such, it can be difficult to interface with one another and they end up confined in “data silos”. The creation of web standards for the publishing of data on the Web and projects to create Linked Data have lead to interest in the creation of resources that can be published using Web principles. One of the most important aspects of “Lexical Linked Data” is the sharing of lexica and machine readable dictionaries. It is for this reason, that the lemon format has been proposed, which we briefly describe. We then consider two resources that seem ideal candidates for the Linked Data cloud, namely WordNet 3.0 and Wiktionary, a large document based dictionary. We discuss the challenges of converting both resources to lemon , and in particular for Wiktionary, the challenge of processing the mark-up, and handling inconsistencies and underspecification in the source material. Finally, we turn to the task of creating links between the two resources and present a novel algorithm for linking lexica as lexical Linked Data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Statistically significant charge clusters (basic, acidic, or of mixed charge) in tertiary protein structures are identified by new methods from a large representative collection of protein structures. About 10% of protein structures show at least one charge cluster, mostly of mixed type involving about equally anionic and cationic residues. Positive charge clusters are very rare. Negative (or histidine-acidic) charge clusters often coordinate calcium, or magnesium or zinc ions [e.g., thermolysin (PDB code: 3tln), mannose-binding protein (2msb), aminopeptidase (1amp)]. Mixed-charge clusters are prominent at interchain contacts where they stabilize quaternary protein formation [e.g., glutathione S-transferase (2gst), catalase (8act), and fructose-1,6-bisphosphate aldolase (1fba)]. They are also involved in protein-protein interaction and in substrate binding. For example, the mixed-charge cluster of aspartate carbamoyl-transferase (8atc) envelops the aspartate carbonyl substrate in a flexible manner (alternating tense and relaxed states) where charge associations can vary from weak to strong. Other proteins with charge clusters include the P450 cytochrome family (BM-3, Terp, Cam), several flavocytochromes, neuraminidase, hemagglutinin, the photosynthetic reaction center, and annexin. In each case in Table 2 we discuss the possible role of the charge clusters with respect to protein structure and function.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a CL-SR system that employs two different techniques: the first one is based on NLP rules that consist on applying logic forms to the topic processing while the second one basically consists on applying the IR-n statistical search engine to the spoken document collection. The application of logic forms to the topics allows to increase the weight of topic terms according to a set of syntactic rules. Thus, the weights of the topic terms are used by IR-n system in the information retrieval process.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The purpose of this study was to identify the strengths and strategies that undocumented college students from Central America used to access and persist in United States higher education. A multiple-case study design was used to conduct in-depth, semi-structured interviews and document collection from ten persons residing in Illinois, Maryland, Ohio, Texas, and Washington. Yosso’s (2005, 2006) community cultural wealth conceptual framework, an analytical and methodological tool, was used to uncover assets used to navigate the higher education system. The findings revealed that participants activated all forms of capital, with cultural capital being the least activated yet necessary, to access and persist in college. Participants also activated most forms of capital together or consecutively in order to attain financial resources, information and social networks that facilitated college access. Participants successfully persisted because they continued to activate forms of capital, displayed a high sense of agency, and managed to sustain college educational goals despite challenges and other external factors. The relationships among forms of capital and federal, state, and institutional policy contexts, which positively influenced both college access and persistence were not illustrated in Yosso’s (2005, 2006) community cultural wealth framework. Therefore, this study presents a modified community cultural wealth framework, which includes these intersections and contexts. In the spirit of Latina/o critical race theory (LatCrit) and critical race theory (CRT), the participants share with other undocumented students suggestions on how to succeed in college. This study can contribute to the growing research of undocumented college students, and develop higher education policy and practice that intentionally consider undocumented college students’ strengths to successfully navigate the institution.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

MOTIVATION: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. RESULTS: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. AVAILABILITY: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. CONTACT: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Recent efforts in the characterization of air-water flows properties have included some clustering process analysis. A cluster of bubbles is defined as a group of two or more bubbles, with a distinct separation from other bubbles before and after the cluster. The present paper compares the results of clustering processes two hydraulic structures. That is, a large-size dropshaft and a hydraulic jump in a rectangular horizontal channel. The comparison highlighted some significant differences in clustering production and structures. Both dropshaft and hydraulic jump flows are complex turbulent shear flows, and some clustering index may provide some measure of the bubble-turbulence interactions and associated energy dissipation.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND: Superinfection with drug resistant HIV strains could potentially contribute to compromised therapy in patients initially infected with drug-sensitive virus and receiving antiretroviral therapy. To investigate the importance of this potential route to drug resistance, we developed a bioinformatics pipeline to detect superinfection from routinely collected genotyping data, and assessed whether superinfection contributed to increased drug resistance in a large European cohort of viremic, drug treated patients. METHODS: We used sequence data from routine genotypic tests spanning the protease and partial reverse transcriptase regions in the Virolab and EuResist databases that collated data from five European countries. Superinfection was indicated when sequences of a patient failed to cluster together in phylogenetic trees constructed with selected sets of control sequences. A subset of the indicated cases was validated by re-sequencing pol and env regions from the original samples. RESULTS: 4425 patients had at least two sequences in the database, with a total of 13816 distinct sequence entries (of which 86% belonged to subtype B). We identified 107 patients with phylogenetic evidence for superinfection. In 14 of these cases, we analyzed newly amplified sequences from the original samples for validation purposes: only 2 cases were verified as superinfections in the repeated analyses, the other 12 cases turned out to involve sample or sequence misidentification. Resistance to drugs used at the time of strain replacement did not change in these two patients. A third case could not be validated by re-sequencing, but was supported as superinfection by an intermediate sequence with high degenerate base pair count within the time frame of strain switching. Drug resistance increased in this single patient. CONCLUSIONS: Routine genotyping data are informative for the detection of HIV superinfection; however, most cases of non-monophyletic clustering in patient phylogenies arise from sample or sequence mix-up rather than from superinfection, which emphasizes the importance of validation. Non-transient superinfection was rare in our mainly treatment experienced cohort, and we found a single case of possible transmitted drug resistance by this route. We therefore conclude that in our large cohort, superinfection with drug resistant HIV did not compromise the efficiency of antiretroviral treatment.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Creative industries tend to concentrate mainly around large- and medium-sized cities, forming creative local production systems. The text analyses the forces behind clustering of creative industries to provide the first empirical explanation of the determinants of creative employment clustering following a multidisciplinary approach based on cultural and creative economics, evolutionary geography and urban economics. A comparative analysis has been performed for Italy and Spain. The results show different patterns of creative employment clustering in both countries. The small role of historical and cultural endowments, the size of the place, the average size of creative industries, the productive diversity and the concentration of human capital and creative class have been found as common factors of clustering in both countries.