3 resultados para Clustering a large document collection

em Biblioteca Digital da Produção Intelectual da Universidade de São Paulo


Relevância:

30.00% 30.00%

Publicador:

Resumo:

XML similarity evaluation has become a central issue in the database and information communities, its applications ranging over document clustering, version control, data integration and ranked retrieval. Various algorithms for comparing hierarchically structured data, XML documents in particular, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being commonly modeled as Ordered Labeled Trees. Yet, a thorough investigation of current approaches led us to identify several similarity aspects, i.e., sub-tree related structural and semantic similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an integrated and fine-grained comparison framework to deal with both structural and semantic similarities in XML documents (detecting the occurrences and repetitions of structurally and semantically similar sub-trees), and to allow the end-user to adjust the comparison process according to her requirements. Our framework consists of four main modules for (i) discovering the structural commonalities between sub-trees, (ii) identifying sub-tree semantic resemblances, (iii) computing tree-based edit operations costs, and (iv) computing tree edit distance. Experimental results demonstrate higher comparison accuracy with respect to alternative methods, while timing experiments reflect the impact of semantic similarity on overall system performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: The CUPID (Cultural and Psychosocial Influences on Disability) study was established to explore the hypothesis that common musculoskeletal disorders (MSDs) and associated disability are importantly influenced by culturally determined health beliefs and expectations. This paper describes the methods of data collection and various characteristics of the study sample. Methods/Principal Findings: A standardised questionnaire covering musculoskeletal symptoms, disability and potential risk factors, was used to collect information from 47 samples of nurses, office workers, and other (mostly manual) workers in 18 countries from six continents. In addition, local investigators provided data on economic aspects of employment for each occupational group. Participation exceeded 80% in 33 of the 47 occupational groups, and after pre-specified exclusions, analysis was based on 12,426 subjects (92 to 1018 per occupational group). As expected, there was high usage of computer keyboards by office workers, while nurses had the highest prevalence of heavy manual lifting in all but one country. There was substantial heterogeneity between occupational groups in economic and psychosocial aspects of work; three-to fivefold variation in awareness of someone outside work with musculoskeletal pain; and more than ten-fold variation in the prevalence of adverse health beliefs about back and arm pain, and in awareness of terms such as "repetitive strain injury" (RSI). Conclusions/Significance: The large differences in psychosocial risk factors (including knowledge and beliefs about MSDs) between occupational groups should allow the study hypothesis to be addressed effectively.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Aims. Spectroscopic, polarimetric, and high spectral resolution interferometric data covering the period 1995-2011 are analyzed to document the transition into a new phase of circumstellar disk activity in the classical Be-shell star 48 Lib. The objective is to use this broad data set to additionally test disk oscillations as the basic underlying dynamical process. Methods. The long-term disk evolution is described using the V/R ratio of the violet and red emission components of H alpha and Br gamma, radial velocities and profiles of He I and optical metal shell lines, as well as multi-band BVRI polarimetry. Single-epoch broad-band and high-resolution interferometric visibilities and phases are discussed with respect to a classical disk model and the given baseline orientations. Results. Spectroscopic signatures of disk asymmetries in 48 Lib vanished in the late nineties but recovered some time between 2004 and 2007, as shown by a new large-amplitude and long-duration V/R cycle. Variations in the radial velocity and line profile of conventional shell lines correlate with the V/R behavior. They are shared by narrow absorption cores superimposed on otherwise seemingly photospheric He I lines, which may form in high-density gas at the inner disk close to the photosphere. Large radial velocity variations continued also during the V/R-quiet years, suggesting that V/R may not always be a good indicator of global density waves in the disk. The comparison of the polarization after the recovery of the V/R activity shows a slight increase, while the polarization angle has been constant for more than 20 years, placing tight limits on any 3-D precession or warping of the disk. The broad H-band interferometry gives a disk diameter of (1.72 +/- 0.2) mas (equivalent to 15 stellar radii), position angle of the disk (50 +/- 9)degrees and a relatively low disk flattening of 1.66 +/- 0.3. Within the errors the same disk position angle is derived from polarimetric observations and from photocenter shifts across Br gamma. The high-resolution interferometric visibility and phase profiles show a double or even multiple-component structure. A preliminary estimate based on the size of the Br gamma emitting region indicates a large diameter for the disk (tens of stellar radii). Overall, no serious contradiction between the observations and the disk-oscillation model could be construed.