962 resultados para Clustering a large document collection
Resumo:
Document ranking is an important process in information retrieval (IR). It presents retrieved documents in an order of their estimated degrees of relevance to query. Traditional document ranking methods are mostly based on the similarity computations between documents and query. In this paper we argue that the similarity-based document ranking is insufficient in some cases. There are two reasons. Firstly it is about the increased information variety. There are far too many different types documents available now for user to search. The second is about the users variety. In many cases user may want to retrieve documents that are not only similar but also general or broad regarding a certain topic. This is particularly the case in some domains such as bio-medical IR. In this paper we propose a novel approach to re-rank the retrieved documents by incorporating the similarity with their generality. By an ontology-based analysis on the semantic cohesion of text, document generality can be quantified. The retrieved documents are then re-ranked by their combined scores of similarity and the closeness of documents’ generality to the query’s. Our experiments have shown an encouraging performance on a large bio-medical document collection, OHSUMED, containing 348,566 medical journal references and 101 test queries.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Comunicación presentada en Cross-Language Evaluation Forum (CLEF 2008), Aarhus, Denmark, September 17-19, 2008.
Resumo:
The objective of this work was to design, construct and commission a new ablative pyrolysis reactor and a high efficiency product collection system. The reactor was to have a nominal throughput of 10 kg/11r of dry biomass and be inherently scalable up to an industrial scale application of 10 tones/hr. The whole process consists of a bladed ablative pyrolysis reactor, two high efficiency cyclones for char removal and a disk and doughnut quench column combined with a wet walled electrostatic precipitator, which is directly mounted on top, for liquids collection. In order to aid design and scale-up calculations, detailed mathematical modelling was undertaken of the reaction system enabling sizes, efficiencies and operating conditions to be determined. Specifically, a modular approach was taken due to the iterative nature of some of the design methodologies, with the output from one module being the input to the next. Separate modules were developed for the determination of the biomass ablation rate, specification of the reactor capacity, cyclone design, quench column design and electrostatic precipitator design. These models enabled a rigorous design protocol to be developed capable of specifying the required reactor and product collection system size for specified biomass throughputs, operating conditions and collection efficiencies. The reactor proved capable of generating an ablation rate of 0.63 mm/s for pine wood at a temperature of 525 'DC with a relative velocity between the heated surface and reacting biomass particle of 12.1 m/s. The reactor achieved a maximum throughput of 2.3 kg/hr, which was the maximum the biomass feeder could supply. The reactor is capable of being operated at a far higher throughput but this would require a new feeder and drive motor to be purchased. Modelling showed that the reactor is capable of achieving a reactor throughput of approximately 30 kg/hr. This is an area that should be considered for the future as the reactor is currently operating well below its theoretical maximum. Calculations show that the current product collection system could operate efficiently up to a maximum feed rate of 10 kg/Fir, provided the inert gas supply was adjusted accordingly to keep the vapour residence time in the electrostatic precipitator above one second. Operation above 10 kg/hr would require some modifications to the product collection system. Eight experimental runs were documented and considered successful, more were attempted but due to equipment failure had to be abandoned. This does not detract from the fact that the reactor and product collection system design was extremely efficient. The maximum total liquid yield was 64.9 % liquid yields on a dry wood fed basis. It is considered that the liquid yield would have been higher had there been sufficient development time to overcome certain operational difficulties and if longer operating runs had been attempted to offset product losses occurring due to the difficulties in collecting all available product from a large scale collection unit. The liquids collection system was highly efficient and modeling determined a liquid collection efficiency of above 99% on a mass basis. This was validated due to the fact that a dry ice/acetone condenser and a cotton wool filter downstream of the collection unit enabled mass measurements of the amount of condensable product exiting the product collection unit. This showed that the collection efficiency was in excess of 99% on a mass basis.
Resumo:
Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.
Resumo:
Relevant past events can be remembered when visualizing related pictures. The main difficulty is how to find these photos in a large personal collection. Query definition and image annotation are key issues to overcome this problem. The former is relevant due to the diversity of the clues provided by our memory when recovering a past moment and the later because images need to be annotated with information regarding those clues to be retrieved. Consequently, tools to recover past memories should deal carefully with these two tasks. This paper describes a user interface designed to explore pictures from personal memories. Users can query the media collection in several ways and for this reason an iconic visual language to define queries is proposed. Automatic and semi-automatic annotation is also performed using the image content and the audio information obtained when users show their images to others. The paper also presents the user interface evaluation based on tests with 58 participants.
Resumo:
Background: Calcified chronic subdural hematoma is a rare but known entity, estimated to represent 0.3-2.7% of chronic subdural hematomas. Although surgical treatment is unanimous for chronic subdural hematomas, therein lies some doubt on it being applied to calcified chronic subdural hematomas. Case Description: We report a case of a 73‑year‑old male, presenting with deterioration of motor function in his right limbs since 18 months, with computed tomography (CT) scans and magnetic resonance imaging (MRI) documenting a large subdural collection of the left hemisphere, with calcified inner membrane, which was successfully and completely removed, with progressive clinical and radiological improvement. Conclusions: We report a case where this type of rare lesion was removed with a progressive and complete resolution of the patient’s symptoms, restoring his previous neurological condition. From the cases described in the literature and our own experience with this case, we think surgical treatment in these patients, when symptomatic, is necessary and viable, frequently resulting in the patient’s improvement.
Resumo:
For the 2004 strategic planning process at Iowa Workforce Development, Director Richard Running asked for as much input from all staff as possible. As a result, planning staff designed an extensive process to gather input over about a three month period during the late spring and summer: • A Guide to Staff Involvement was drafted and distributed to staff in offices throughout the state. This guide provided a brief explanation of the planning process and quoted extensively from the Vilsack/Pederson Leadership Agenda and the 2003 IWD strategic plan to illustrate each step and to show examples of alignment. The guide also provided suggestions for staff in various locations and work units to conduct their own planning sessions. The structure was designed to solicit feedback regarding elements (vision, mission, guiding principles, goals and strategies) of the existing 2003 plan. Particular attention was devoted to securing non-management staff’s perspective during the internal and external assessment exercises. • Several local offices did conduct their own structured input sessions following the suggested guidelines and sent the results to planning staff in the central administrative offices. • Other work units in many locations opted to ask planning staff to facilitate planning sessions for them. The results of these sessions were also gathered by planning staff. In all, dozens of input sessions were held and hundreds of IWD staff participated directly in the process. Because all the sessions followed similar guidelines, it was relatively easy to combine all of the input received and spot common themes that surfaced from the many sessions. A composite of all the flip chart notes was compiled into one large document (for those who like lots of detail) and another document summarized the key themes that emerged. This information was used in a day-long planning retreat on August 20. Management staff members from throughout the department were invited and each work unit and sub-state region also brought a non-management staff person as well. This group reviewed the themes from the earlier sessions and then addressed each element of the 2003 plan, proposing refinements for almost all sections. Subsequently, senior management reviewed the results of the retreat and made the final decisions for the new 2004 plan. This thorough approach, with its special emphasis on input from line staff, did result in some significant changes to IWD’s plan. Local office staff, for example, consistently expressed the need to step up our marketing efforts, especially with employers. Another need that was expressed clearly and often was the need to beef up staff training efforts, much of the capacity for which had been lost in budget and staff reductions a few years ago. Neither of these issues is new, but the degree of concern expressed by IWD staff has caused us to elevate their importance in this year’s plan.
Resumo:
Pursuant to Iowa Code 216A, subchapter 9, CJJP is required to issue an annual report containing long-range systems goals, special issue planning recommendations and research findings. CJJP’s 1998 response to its reporting requirement is replicated in the manner of the distribution of the 1997 Update. Again this year, CJJP is issuing one large document which contains many separate reports. Single-issue 1998 Update reports will be made available based on reader interest and need. Having utilized this disseminating approach of CJJP research and reports in 1997, it proved to be cost effective and responsive to the planning activities and information needs of Iowa’s policy makers, justice system officials and others.
Resumo:
Pursuant to Iowa Code 216A, subchapter 9, CJJP is required to issue an annual report containing long-range system goals, special issue planning recommendations and research findings. CJJP’s 1997 response to its reporting requirement is different from past years. Rather than issuing one large document containing many separate reports, single-issue 1997 Update reports now are being made available based on reader interest and need. It is hoped this approach to disseminating CJJP research and planning reports will be more cost effective and more responsive to the planning activities and information needs of Iowa’s policy makers, justice system officials and others.
Resumo:
Pursuant to Iowa Code 216A, subchapter 9, CJJP is required to issue an annual report containing long-range systems goals, special issue planning recommendations and research findings. CJJP’s 1998 response to its reporting requirement is replicated in the manner of the distribution of the 1997 Update. Again this year, CJJP is issuing one large document which contains many separate reports. Single-issue 1998 Update reports will be made available based on reader interest and need.
Resumo:
Control of the world-wide spread of methicillin-resistant Staphylococcus aureus (MRSA) has been unsuccessful in most developed countries. A few countries have been able to maintain a low MRSA prevalence, plausibly due to their strict MRSA control policies. Such policies require wide-scale screening of patients with suspected MRSA colonization, in order to nurse the MRSA-positive patients in contact isolation. The aim of this study was to develop and introduce a 2-photon excited fluorescence detection (TPX) technique for screening of MRSA directly from clinical samples. The assay principle involves specific online immunometric monitoring of S. aureus growth under selective antibiotic pressure. After the novel TPX approach had been set up, its applicability for the detection of MRSA was evaluated using a large MRSA collection including practically all epidemic MRSA strains identified in Finland between 1991 and 2009. The TPX assay was found both sensitive (97.9%) and specific (94.1%) in this epidemiological setting, illustrating that the method is tolerant to wide biological variation as well as to environments with rapidly emerging MRSA strains. When MRSA was screened directly from colonization samples, all patients positive for MRSA by conventional methods were positive also by the TPX assay. The assay capacity was 48 samples per a test run, and the median time required for confirmation of a true-positive screening test result was 3 h 26 min. Collectively, the findings presented in this thesis suggest that the TPX MRSA screening assay could be applicable for direct screening of MRSA colonization samples without any prior steps of isolation. This can potentially mean that contact isolation of suspected carriers testing negative could be discontinued earlier, thereby reducing the costs and burden associated with the containment of MRSA. In case of infection, a positive test result would ensure an early onset of effective therapy.
Resumo:
This qualitative case study explored the process of implementing Experiential Education (EXED) in Yukon Territory Kindergarten to Grade 12 (K-12) schools with a particular focus on investigating: (a) understandings of EXED and the drivers behind its implementation, (b) factors contributing to EXED’s suitability for Yukon schools, and (c) factors supporting and challenging the implementation of EXED in Yukon schools. Data collection involved interviews with Yukon Department of Education (YDE) staff members, principals and teachers, document collection, and reflective note collection. Findings indicated that EXED was understood as more of a methodology than a philosophy for teaching and learning. EXED implementation was primarily driven by bottom-up (school/ teacher) initiatives and was secondarily supported by top-down (YDE) efforts. The process of implementation was supported by three main factors and was challenged primarily by six factors. The results also pointed to three factors that made EXED suitable for implementation in Yukon schools.
Resumo:
This qualitative case study explored the process of implementing Experiential Education (EXED) in Yukon Territory Kindergarten to Grade 12 (K-12) schools with a particular focus on investigating: (a) understandings of EXED and the drivers behind its implementation, (b) factors contributing to EXED's suitability for Yukon schools, and (c) factors supporting and challenging the implementation of EXED in Yukon schools. Data collection involved interviews with Yukon Department of Education (YDE) staff members, principals and teachers, document collection, and reflective note collection. Findings indicated that EXED was understood as more of a me~odology than a philosophy for teaching and learning. EXED implementation was primarily driven by bottom-up (school! teacher) initiatives and was secondarily supported by top-down (YDE) efforts. The process of implementation was supported by three main factors and was challenged primarily by six factors. The results also pointed to three factors that made EXED suitable for implementation in Yukon schools.
Resumo:
This research explored environmental sustainability (ES) initiatives at five top-ranked Ontario golf courses that were members of the Audubon Cooperative Sanctuary Program for Golf (ACSP). Research Questions: (1) How are golf courses adapting to safeguard the natural environment? (2) Why or why not are golf courses moving to ES? and (3) What are the arising barriers to ES in golf and how can they be overcome; what role does communication play? Overall, the research was framed with an adaptation of the dimensions of convergence by Houlihan (2012), including the motives, inputs, implementation, momentum, and impact. Additionally, impression management and message framing constructs were utilized to address the issue of communicating ES initiatives. Data collection involved in-depth interviews, observations, and unobtrusive document collection. Environmental aspects of the examination were guided by the Canadian Standards Association (CSA) Requirements and Guidance for Organizers of Sustainable Events and Sustainable Sport and Event Toolkit (SSET).