3 resultados para Clustering a large document collection

em Aston University Research Archive


Relevância:

100.00% 100.00%

Publicador:

Resumo:

The objective of this work was to design, construct and commission a new ablative pyrolysis reactor and a high efficiency product collection system. The reactor was to have a nominal throughput of 10 kg/11r of dry biomass and be inherently scalable up to an industrial scale application of 10 tones/hr. The whole process consists of a bladed ablative pyrolysis reactor, two high efficiency cyclones for char removal and a disk and doughnut quench column combined with a wet walled electrostatic precipitator, which is directly mounted on top, for liquids collection. In order to aid design and scale-up calculations, detailed mathematical modelling was undertaken of the reaction system enabling sizes, efficiencies and operating conditions to be determined. Specifically, a modular approach was taken due to the iterative nature of some of the design methodologies, with the output from one module being the input to the next. Separate modules were developed for the determination of the biomass ablation rate, specification of the reactor capacity, cyclone design, quench column design and electrostatic precipitator design. These models enabled a rigorous design protocol to be developed capable of specifying the required reactor and product collection system size for specified biomass throughputs, operating conditions and collection efficiencies. The reactor proved capable of generating an ablation rate of 0.63 mm/s for pine wood at a temperature of 525 'DC with a relative velocity between the heated surface and reacting biomass particle of 12.1 m/s. The reactor achieved a maximum throughput of 2.3 kg/hr, which was the maximum the biomass feeder could supply. The reactor is capable of being operated at a far higher throughput but this would require a new feeder and drive motor to be purchased. Modelling showed that the reactor is capable of achieving a reactor throughput of approximately 30 kg/hr. This is an area that should be considered for the future as the reactor is currently operating well below its theoretical maximum. Calculations show that the current product collection system could operate efficiently up to a maximum feed rate of 10 kg/Fir, provided the inert gas supply was adjusted accordingly to keep the vapour residence time in the electrostatic precipitator above one second. Operation above 10 kg/hr would require some modifications to the product collection system. Eight experimental runs were documented and considered successful, more were attempted but due to equipment failure had to be abandoned. This does not detract from the fact that the reactor and product collection system design was extremely efficient. The maximum total liquid yield was 64.9 % liquid yields on a dry wood fed basis. It is considered that the liquid yield would have been higher had there been sufficient development time to overcome certain operational difficulties and if longer operating runs had been attempted to offset product losses occurring due to the difficulties in collecting all available product from a large scale collection unit. The liquids collection system was highly efficient and modeling determined a liquid collection efficiency of above 99% on a mass basis. This was validated due to the fact that a dry ice/acetone condenser and a cotton wool filter downstream of the collection unit enabled mass measurements of the amount of condensable product exiting the product collection unit. This showed that the collection efficiency was in excess of 99% on a mass basis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Web document cluster analysis plays an important role in information retrieval by organizing large amounts of documents into a small number of meaningful clusters. Traditional web document clustering is based on the Vector Space Model (VSM), which takes into account only two-level (document and term) knowledge granularity but ignores the bridging paragraph granularity. However, this two-level granularity may lead to unsatisfactory clustering results with “false correlation”. In order to deal with the problem, a Hierarchical Representation Model with Multi-granularity (HRMM), which consists of five-layer representation of data and a twophase clustering process is proposed based on granular computing and article structure theory. To deal with the zero-valued similarity problemresulted from the sparse term-paragraphmatrix, an ontology based strategy and a tolerance-rough-set based strategy are introduced into HRMM. By using granular computing, structural knowledge hidden in documents can be more efficiently and effectively captured in HRMM and thus web document clusters with higher quality can be generated. Extensive experiments show that HRMM, HRMM with tolerancerough-set strategy, and HRMM with ontology all outperform VSM and a representative non VSM-based algorithm, WFP, significantly in terms of the F-Score.