Nos últimos anos temos vindo a assistir a uma mudança na forma como a informação é disponibilizada online. O surgimento da web para todos possibilitou a fácil edição, disponibilização e partilha da informação gerando um considerável aumento da mesma. Rapidamente surgiram sistemas que permitem a coleção e partilha dessa informação, que para além de possibilitarem a coleção dos recursos também permitem que os utilizadores a descrevam utilizando tags ou comentários. A organização automática dessa informação é um dos maiores desafios no contexto da web atual. Apesar de existirem vários algoritmos de clustering, o compromisso entre a eficácia (formação de grupos que fazem sentido) e a eficiência (execução em tempo aceitável) é difícil de encontrar. Neste sentido, esta investigação tem por problemática aferir se um sistema de agrupamento automático de documentos, melhora a sua eficácia quando se integra um sistema de classificação social. Analisámos e discutimos dois métodos baseados no algoritmo k-means para o clustering de documentos e que possibilitam a integração do tagging social nesse processo. O primeiro permite a integração das tags diretamente no Vector Space Model e o segundo propõe a integração das tags para a seleção das sementes iniciais. O primeiro método permite que as tags sejam pesadas em função da sua ocorrência no documento através do parâmetro Social Slider. Este método foi criado tendo por base um modelo de predição que sugere que, quando se utiliza a similaridade dos cossenos, documentos que partilham tags ficam mais próximos enquanto que, no caso de não partilharem, ficam mais distantes. O segundo método deu origem a um algoritmo que denominamos k-C. Este para além de permitir a seleção inicial das sementes através de uma rede de tags também altera a forma como os novos centróides em cada iteração são calculados. A alteração ao cálculo dos centróides teve em consideração uma reflexão sobre a utilização da distância euclidiana e similaridade dos cossenos no algoritmo de clustering k-means. No contexto da avaliação dos algoritmos foram propostos dois algoritmos, o algoritmo da “Ground truth automática” e o algoritmo MCI. O primeiro permite a deteção da estrutura dos dados, caso seja desconhecida, e o segundo é uma medida de avaliação interna baseada na similaridade dos cossenos entre o documento mais próximo de cada documento. A análise de resultados preliminares sugere que a utilização do primeiro método de integração das tags no VSM tem mais impacto no algoritmo k-means do que no algoritmo k-C. Além disso, os resultados obtidos evidenciam que não existe correlação entre a escolha do parâmetro SS e a qualidade dos clusters. Neste sentido, os restantes testes foram conduzidos utilizando apenas o algoritmo k-C (sem integração de tags no VSM), sendo que os resultados obtidos indicam que a utilização deste algoritmo tende a gerar clusters mais eficazes.


Cost-effective semantic description and annotation of shared knowledge resources has always been of great importance for digital libraries and large scale information systems in general. With the emergence of the Social Web and Web 2.0 technologies, a more effective semantic description and annotation, e.g., folksonomies, of digital library contents is envisioned to take place in collaborative and personalised environments. However, there is a lack of foundation and mathematical rigour for coping with contextualised management and retrieval of semantic annotations throughout their evolution as well as diversity in users and user communities. In this paper, we propose an ontological foundation for semantic annotations of digital libraries in terms of flexonomies. The proposed theoretical model relies on a high dimensional space with algebraic operators for contextualised access of semantic tags and annotations. The set of the proposed algebraic operators, however, is an adaptation of the set theoretic operators selection, projection, difference, intersection, union in database theory. To this extent, the proposed model is meant to lay the ontological foundation for a Digital Library 2.0 project in terms of geometric spaces rather than logic (description) based formalisms as a more efficient and scalable solution to the semantic annotation problem in large scale.


The evolution of new technology and its increasing use, have for some years been making the existence of informal learning more and more transparent, especially among young and older adults in both Higher Education and workplace contexts. However, the nature of formal and non-formal, course-based, approaches to learning has made it hard to accommodate these informal processes satisfactorily, and although technology bring us near to the solution, it has not yet achieved. TRAILER project aims to address this problem by developing a tool for the management of competences and skills acquired through informal learning experiences, both from the perspective of the user and the institution or company. This paper describes the research and development main lines of this project.


In this class, we will discuss metadata as well as current phenomena such as tagging and folksonomies. Readings: Ontologies Are Us: A Unified Model of Social Networks and Semantics, P. Mika, International Semantic Web Conference, 522-536, 2005. [Web link] Optional: Folksonomies: power to the people, E. Quintarelli, ISKO Italy-UniMIB Meeting, (2005)


‘Bilingual’ documents, with text in both Demotic and Greek, can be of several sorts, ranging from complete translations of the same information (e.g. Ptolemaic decrees) to those where the information presented in the two languages is complementary (e.g. mummy labels). The texts discussed in this paper consist of a number of examples of financial records where a full account in one language (L1) is annotated with brief pieces of information in a second language (L2). These L2 ‘tags’ are designed to facilitate extraction of summary data at another level of the administration, functioning in a different language, and probably also to make the document accessible to those who are not literate in the L1.


Tagging provides support for retrieval and categorization of online content depending on users' tag choice. A number of models of tagging behaviour have been proposed to identify factors that are considered to affect taggers, such as users' tagging history. In this paper, we use Semiotics Analysis and Activity theory, to study the effect the system designer has over tagging behaviour. The framework we use shows the components that comprise the tagging system and how they interact together to direct tagging behaviour. We analysed two collaborative tagging systems: CiteULike and Delicious by studying their components by applying our framework. Using datasets from both systems, we found that 35% of CiteULike users did not provide tags compared to only 0.1% of Delicious users. This was directly linked to the type of tools used by the system designer to support tagging.


The Hyades stream has long been thought to be a dispersed vestige of the Hyades cluster. However, recent analyses of the parallax distribution, of the mass function, and of the action-space distribution of stream stars have shown it to be rather composed of orbits trapped at a resonance of a density disturbance. This resonant scenario should leave a clearly different signature in the element abundances of stream stars than the dispersed cluster scenario, since the Hyades cluster is chemically homogeneous. Here, we study the metallicity as well as the element abundances of Li, Na, Mg, Fe, Zr, Ba, La, Ce, Nd and Eu for a random sample of stars belonging to the Hyades stream, and compare them with those of stars from the Hyades cluster. From this analysis: (i) we independently confirm that the Hyades stream cannot be solely composed of stars originating in the Hyades cluster; (ii) we show that some stars (namely 2/21) from the Hyades stream nevertheless have abundances compatible with an origin in the cluster; (iii) we emphasize that the use of Li as a chemical tag of the cluster origin of main-sequence stars is very efficient in the range 5500 K <= T(eff) <= 6200 K, since the Li sequence in the Hyades cluster is very tight, while at the same time spanning a large abundance range; (iv) we show that, while this evaporated population has a metallicity excess of similar to 0.2 dex with respect to the local thin-disc population, identical to that of the Hyades cluster, the remainder of the Hyades stream population has still a metallicity excess of similar to 0.06-0.15 dex, consistent with an origin in the inner Galaxy and (v) we show that the Hyades stream can be interpreted as an inner 4:1 resonance of the spiral pattern: this then also reproduces an orbital family compatible with the Sirius stream, and places the origin of the Hyades stream up to 1 kpc inwards from the solar radius, which might explain the observed metallicity excess of the stream population.


This paper evaluates six commonly available parts-of-speech tagging tools over corpora other than those upon which they were originally trained. In particular this investigation measures the performance of the selected tools over varying styles and genres of text without retraining, under the assumption that domain specific training data is not always available. An investigation is performed to determine whether improved results can be achieved by combining the set of tagging tools into ensembles that use voting schemes to determine the best tag for each word. It is found that while accuracy drops due to non-domain specific training, and tag-mapping between corpora, accuracy remains very high, with the support vector machine-based tagger, and the decision tree-based tagger performing best over different corpora. It is also found that an ensemble containing a support vector machine-based tagger, a probabilistic tagger, a decision-tree based tagger and a rule-based tagger produces the largest increase in accuracy and the largest reduction in error across different corpora, using the Precision-Recall voting scheme.


The movements of 8 green turtles Chelonia mydas in Brazilian coastal waters were tracked using transmitters linked to the Argos system for periods of between 1 and 197 d. These were the first tracking data gathered on juveniles of this species in this important foraging ground. Information was integrated with that collected over a decade using traditional flipper-tagging methods at the same site. Both satellite telemetry and flipper tagging suggested that turtles undertook 1 of 3 general patterns of behaviour: pronounced long range movements (>100 km), moderate range movements (<100 km) or extended residence very close to the capture/release site. There seemed to be a general tendency for the turtles recaptured/tracked further afield to have been among the larger turtles captured. Satellite tracking of 5 turtles which moved from the release site showed that they moved through coastal waters; a factor which is likely to predispose migrating turtles to incidental capture as a result of the prevailing fishing methods in the region. The movements of the 3 turtles who travelled less than 100 km from the release site challenge previous ideas relating to home range in green turtles feeding in sea grass pastures. We hypothesise that there may be a fundamental difference in the pattern of habitat utilisation by larger green turtles depending on whether they are feeding on seagrass or macroalgae. Extended tracking of 2 small turtles which stayed near the release point showed that small juvenile turtles, whilst in residence in a particular feeding ground, can also exhibit high levels of site-fidelity with home ranges of the order of several square kilometers.


Tagging recommender systems allow Internet users to annotate resources with personalized tags. The connection among users, resources and these annotations, often called afolksonomy, permits users the freedom to explore tags, and to obtain recommendations. Releasing these tagging datasets accelerates both commercial and research work on recommender systems. However, adversaries may re-identify a user and her/his sensitivity information from the tagging dataset using a little background information. Recently, several private techniques have been proposed to address the problem, but most of them lack a strict privacy notion, and can hardly resist the number of possible attacks. This paper proposes an private releasing algorithm to perturb users' profile in a strict privacy notion, differential privacy, with the goal of preserving a user's identity in a tagging dataset. The algorithm includes three privacy preserving operations: Private Tag Clustering is used to shrink the randomized domain and Private Tag Selection is then applied to find the most suitable replacement tags for the original tags. To hide the numbers of tags, the third operation, Weight Perturbation, finally adds Lap lace noise to the weight of tags We present extensive experimental results on two real world datasets, Delicious and Bibsonomy. While the personalization algorithmis successful in both cases.


Tagging recommender systems allow Internet users to annotate resources with personalized tags. The connection among users, resources and these annotations, often called a folksonomy, permits users the freedom to explore tags, and to obtain recommendations. Releasing these tagging datasets accelerates both commercial and research work on recommender systems. However, tagging recommender systems has been confronted with serious privacy concerns because adversaries may re-identify a user and her/his sensitive information from the tagging dataset using a little background information. Recently, several private techniques have been proposed to address the problem, but most of them lack a strict privacy notion, and can hardly resist the number of possible attacks. This paper proposes an private releasing algorithm to perturb users' profile in a strict privacy notion, differential privacy, with the goal of preserving a user's identity in a tagging dataset. The algorithm includes three privacy-preserving operations: Private Tag Clustering is used to shrink the randomized domain and Private Tag Selection is then applied to find the most suitable replacement tags for the original tags. To hide the numbers of tags, the third operation, Weight Perturbation, finally adds Laplace noise to the weight of tags. We present extensive experimental results on two real world datasets, De.licio.us and Bibsonomy. While the personalization algorithm is successful in both cases, our results further suggest the private releasing algorithm can successfully retain the utility of the datasets while preserving users' identity.