994 resultados para Hierarchical document


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Clustering is an important technique in organising and categorising web scale documents. The main challenges faced in clustering the billions of documents available on the web are the processing power required and the sheer size of the datasets available. More importantly, it is nigh impossible to generate the labels for a general web document collection containing billions of documents and a vast taxonomy of topics. However, document clusters are most commonly evaluated by comparison to a ground truth set of labels for documents. This paper presents a clustering and labeling solution where the Wikipedia is clustered and hundreds of millions of web documents in ClueWeb12 are mapped on to those clusters. This solution is based on the assumption that the Wikipedia contains such a wide range of diverse topics that it represents a small scale web. We found that it was possible to perform the web scale document clustering and labeling process on one desktop computer under a couple of days for the Wikipedia clustering solution containing about 1000 clusters. It takes longer to execute a solution with finer granularity clusters such as 10,000 or 50,000. These results were evaluated using a set of external data.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

A Bitcoin wallet is a set of private keys known to a user and which allow that user to spend any Bitcoin associated with those keys. In a hierarchical deterministic (HD) wallet, child private keys are generated pseudorandomly from a master private key, and the corresponding child public keys can be generated by anyone with knowledge of the master public key. These wallets have several interesting applications including Internet retail, trustless audit, and a treasurer allocating funds among departments. A specification of HD wallets has even been accepted as Bitcoin standard BIP32. Unfortunately, in all existing HD wallets---including BIP32 wallets---an attacker can easily recover the master private key given the master public key and any child private key. This vulnerability precludes use cases such as a combined treasurer-auditor, and some in the Bitcoin community have suspected that this vulnerability cannot be avoided. We propose a new HD wallet that is not subject to this vulnerability. Our HD wallet can tolerate the leakage of up to m private keys with a master public key size of O(m). We prove that breaking our HD wallet is at least as hard as the so-called "one more" discrete logarithm problem.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

For traditional information filtering (IF) models, it is often assumed that the documents in one collection are only related to one topic. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. Topic modelling was proposed to generate statistical models to represent multiple topics in a collection of documents, but in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. This paper proposes a novel information filtering model, Significant matched Pattern-based Topic Model (SPBTM). The SPBTM represents user information needs in terms of multiple topics and each topic is represented by patterns. More importantly, the patterns are organized into groups based on their statistical and taxonomic features, from which the more representative patterns, called Significant Matched Patterns, can be identified and used to estimate the document relevance. Experiments on benchmark data sets demonstrate that the SPBTM significantly outperforms the state-of-the-art models.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Existing techniques for automated discovery of process models from event logs gen- erally produce flat process models. Thus, they fail to exploit the notion of subprocess as well as error handling and repetition constructs provided by contemporary process modeling notations, such as the Business Process Model and Notation (BPMN). This paper presents a technique for automated discovery of hierarchical BPMN models con- taining interrupting and non-interrupting boundary events and activity markers. The technique employs functional and inclusion dependency discovery techniques in order to elicit a process-subprocess hierarchy from the event log. Given this hierarchy and the projected logs associated to each node in the hierarchy, parent process and subprocess models are then discovered using existing techniques for flat process model discovery. Finally, the resulting models and logs are heuristically analyzed in order to identify boundary events and markers. By employing approximate dependency discovery tech- niques, it is possible to filter out noise in the event log arising for example from data entry errors or missing events. A validation with one synthetic and two real-life logs shows that process models derived by the proposed technique are more accurate and less complex than those derived with flat process discovery techniques. Meanwhile, a validation on a family of synthetically generated logs shows that the technique is resilient to varying levels of noise.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Purpose This paper aims to set out a new hierarchical and differentiated model of social marketing principles, concepts and techniques that builds on, but supersedes, the existing lists of non-equivalent and undifferentiated benchmark criteria. Design/methodology/approach This is a conceptual paper that proposes a hierarchical model of social marketing principles, concepts and techniques. Findings This new delineation of the social marketing principle, its four core concepts and five techniques, represents a new way to conceptualize and recognize the different elements that constitute social marketing. This new model will help add to and further the development of the theoretical basis of social marketing, building on the definitional work led by the International Social Marketing Association (iSMA), Australian Association of Social Marketing (AASM) and European Social Marketing Association (ESMA). Research limitations/implications This proposed model offers a foundation for future research to expand upon. Further research is recommended to empirically test the proposed model. Originality/value This paper seeks to advance the theoretical base of social marketing by making a reasoned case for the need to differentiate between principles, concepts and techniques when seeking to describe social marketing.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This cross disciplinary study was conducted as two research and development projects. The outcome is a multimodal and dynamic chronicle, which incorporates the tracking of spatial, temporal and visual elements of performative practice-led and design-led research journeys. The distilled model provides a strong new approach to demonstrate rigour in non-traditional research outputs including provenance and an 'augmented web of facticity'.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Genetic correlation (rg) analysis determines how much of the correlation between two measures is due to common genetic influences. In an analysis of 4 Tesla diffusion tensor images (DTI) from 531 healthy young adult twins and their siblings, we generalized the concept of genetic correlation to determine common genetic influences on white matter integrity, measured by fractional anisotropy (FA), at all points of the brain, yielding an NxN genetic correlation matrix rg(x,y) between FA values at all pairs of voxels in the brain. With hierarchical clustering, we identified brain regions with relatively homogeneous genetic determinants, to boost the power to identify causal single nucleotide polymorphisms (SNP). We applied genome-wide association (GWA) to assess associations between 529,497 SNPs and FA in clusters defined by hubs of the clustered genetic correlation matrix. We identified a network of genes, with a scale-free topology, that influences white matter integrity over multiple brain regions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Modern non-invasive brain imaging technologies, such as diffusion weighted magnetic resonance imaging (DWI), enable the mapping of neural fiber tracts in the white matter, providing a basis to reconstruct a detailed map of brain structural connectivity networks. Brain connectivity networks differ from random networks in their topology, which can be measured using small worldness, modularity, and high-degree nodes (hubs). Still, little is known about how individual differences in structural brain network properties relate to age, sex, or genetic differences. Recently, some groups have reported brain network biomarkers that enable differentiation among individuals, pairs of individuals, and groups of individuals. In addition to studying new topological features, here we provide a unifying general method to investigate topological brain networks and connectivity differences between individuals, pairs of individuals, and groups of individuals at several levels of the data hierarchy, while appropriately controlling false discovery rate (FDR) errors. We apply our new method to a large dataset of high quality brain connectivity networks obtained from High Angular Resolution Diffusion Imaging (HARDI) tractography in 303 young adult twins, siblings, and unrelated people. Our proposed approach can accurately classify brain connectivity networks based on sex (93% accuracy) and kinship (88.5% accuracy). We find statistically significant differences associated with sex and kinship both in the brain connectivity networks and in derived topological metrics, such as the clustering coefficient and the communicability matrix.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Developing nano/micro-structures which can effectively upgrade the intriguing properties of electrode materials for energy storage devices is always a key research topic. Ultrathin nanosheets were proved to be one of the potential nanostructures due to their high specific surface area, good active contact areas and porous channels. Herein, we report a unique hierarchical micro-spherical morphology of well-stacked and completely miscible molybdenum disulfide (MoS2) nanosheets and graphene sheets, were successfully synthesized via a simple and industrial scale spray-drying technique to take the advantages of both MoS2 and graphene in terms of their high practical capacity values and high electronic conductivity, respectively. Computational studies were performed to understand the interfacial behaviour of MoS2 and graphene, which proves high stability of the composite with high interfacial binding energy (−2.02 eV) among them. Further, the lithium and sodium storage properties have been tested and reveal excellent cyclic stability over 250 and 500 cycles, respectively, with the highest initial capacity values of 1300 mAh g−1 and 640 mAh g−1 at 0.1 A g−1.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This research examined the function of Queensland Health's Root Cause Analysis (RCA) to improve patient safety through an investigation of patient harm events where permanent harm and preventable death, Severity Assessment Code 1, were the outcome of healthcare. Unedited and highly legislated RCAs from across Queensland Health public hospitals from 2009, 2010 and 2011 comprised the data. A document analysis revealed the RCAs opposed organisational policy and dominant theoretical directives. If we accept the prevailing assumption that patient harm is a systemic issue, then the RCA is failing to address harm events in healthcare.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

AIM: This study investigated the ability of an osteoconductive biphasic scaffold to simultaneously regenerate alveolar bone, periodontal ligament and cementum. MATERIALS AND METHODS: A biphasic scaffold was built by attaching a fused deposition modelled bone compartment to a melt electrospun periodontal compartment. The bone compartment was coated with a calcium phosphate (CaP) layer for increasing osteoconductivity, seeded with osteoblasts and cultured in vitro for 6 weeks. The resulting constructs were then complemented with the placement of PDL cell sheets on the periodontal compartment, attached to a dentin block and subcutaneously implanted into athymic rats for 8 weeks. Scanning electron microscopy, X-ray diffraction, alkaline phosphatase and DNA content quantification, confocal laser microscopy, micro computerized tomography and histological analysis were employed to evaluate the scaffold's performance. RESULTS: The in vitro study showed that alkaline phosphatase activity was significantly increased in the CaP-coated samples and they also displayed enhanced mineralization. In the in vivo study, significantly more bone formation was observed in the coated scaffolds. Histological analysis revealed that the large pore size of the periodontal compartment permitted vascularization of the cell sheets, and periodontal attachment was achieved at the dentin interface. CONCLUSIONS: This work demonstrates that the combination of cell sheet technology together with an osteoconductive biphasic scaffold could be utilized to address the limitations of current periodontal regeneration techniques.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Previous qualitative research has highlighted that temporality plays an important role in relevance for clinical records search. In this study, an investigation is undertaken to determine the effect that the timespan of events within a patient record has on relevance in a retrieval scenario. In addition, based on the standard practise of document length normalisation, a document timespan normalisation model that specifically accounts for timespans is proposed. Initial analysis revealed that in general relevant patient records tended to cover a longer timespan of events than non-relevant patient records. However, an empirical evaluation using the TREC Medical Records track supports the opposite view that shorter documents (in terms of timespan) are better for retrieval. These findings highlight that the role of temporality in relevance is complex and how to effectively deal with temporality within a retrieval scenario remains an open question.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Cancer is the leading contributor to the disease burden in Australia. This thesis develops and applies Bayesian hierarchical models to facilitate an investigation of the spatial and temporal associations for cancer diagnosis and survival among Queenslanders. The key objectives are to document and quantify the importance of spatial inequalities, explore factors influencing these inequalities, and investigate how spatial inequalities change over time. Existing Bayesian hierarchical models are refined, new models and methods developed, and tangible benefits obtained for cancer patients in Queensland. The versatility of using Bayesian models in cancer control are clearly demonstrated through these detailed and comprehensive analyses.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Document clustering is one of the prominent methods for mining important information from the vast amount of data available on the web. However, document clustering generally suffers from the curse of dimensionality. Providentially in high dimensional space, data points tend to be more concentrated in some areas of clusters. We take advantage of this phenomenon by introducing a novel concept of dynamic cluster representation named as loci. Clusters’ loci are efficiently calculated using documents’ ranking scores generated from a search engine. We propose a fast loci-based semi-supervised document clustering algorithm that uses clusters’ loci instead of conventional centroids for assigning documents to clusters. Empirical analysis on real-world datasets shows that the proposed method produces cluster solutions with promising quality and is substantially faster than several benchmarked centroid-based semi-supervised document clustering methods.