Online communities offer a platform to support and discuss health issues. They provide a more accessible way to bring people of the same concerns or interests. This paper aims to study the characteristics of online autism communities (called Clinical) in comparison with other online communities (called Control) using data from 110 Live Journal weblog communities. Using machine learning techniques, we comprehensively analyze these online autism communities. We study three key aspects expressed in the blog posts made by members of the communities: sentiment, topics and language style. Sentiment analysis shows that the sentiment of the clinical group has lower valence, indicative of poorer moods than people in control. Topics and language styles are shown to be good predictors of autism posts. The result shows the potential of social media in medical studies for a broad range of purposes such as screening, monitoring and subsequently providing supports for online communities of individuals with special needs.


Tagging recommender system allows Internet users to annotate resources with personalized tags and provides users the freedom to obtain recommendations. However, It is usually confronted with serious privacy concerns, because adversaries may re-identify a user and her/his sensitive tags with only a little background information. This paper proposes a privacy preserving tagging release algorithm, PriTop, which is designed to protect users under the notion of differential privacy. The proposed PriTop algorithm includes three privacy preserving operations: Private Topic Model Generation structures the uncontrolled tags, Private Weight Perturbation adds Laplace noise into the weights to hide the numbers of tags; while Private Tag Selection finally finds the most suitable replacement tags for the original tags. We present extensive experimental results on four real world datasets and results suggest the proposed PriTop algorithm can successfully retain the utility of the datasets while preserving privacy. © 2014 Springer International Publishing.


We propose a novel hierarchical Bayesian framework, word-distance-dependent Chinese restaurant franchise (wd-dCRF) for topic discovery from a document corpus regularized by side information in the form of word-to-word relations, with an application on Electronic Medical Records (EMRs). Typically, a EMRs dataset consists of several patients (documents) and each patient contains many diagnosis codes (words). We exploit the side information available in the form of a semantic tree structure among the diagnosis codes for semantically-coherent disease topic discovery. We introduce novel functions to compute word-to-word distances when side information is available in the form of tree structures. We derive an efficient inference method for the wddCRF using MCMC technique. We evaluate on a real world medical dataset consisting of about 1000 patients with PolyVascular disease. Compared with the popular topic analysis tool, hierarchical Dirichlet process (HDP), our model discovers topics which are superior in terms of both qualitative and quantitative measures.


Child sexual abuse has a serious impact on victims, their families and the broader community. As such, there is a critical need for sound research evidence to inform specialist responses. Increasingly, researchers are utilising administrative databases to track outcomes of individual cases across health, justice and other government agencies. There are unique advantages to this approach, including the ability to access a rich source of information at a population-wide level. However, the potential limitations of utilising administrative databases have not been fully explored. Because these databases were created originally for administrative rather than research purposes, there are significant problems with using this data at face value for research projects. We draw on our collective research experience in child sexual abuse to highlight common problems that have emerged when applying administrative databases to research questions. Some of the problems discussed include identification of relevant cases, ensuring reliability and dealing with missing data. Our article concludes with recommendations for researchers and policy-makers to enhance data quality.


INTRODUCTION: The proportion of patients who die during or after surgery, otherwise known as the perioperative mortality rate (POMR), is a credible indicator of the safety and quality of operative care. Its accuracy and usefulness as a metric, however, particularly one that enables valid comparisons over time or between jurisdictions, has been limited by lack of a standardized approach to measurement and calculation, poor understanding of when in relation to surgery it is best measured, and whether risk-adjustment is needed. Our aim was to evaluate the value of POMR as a global surgery metric by addressing these issues using 4, large, mixed, surgical datasets that represent high-, middle-, and low-income countries. METHODS: We obtained data from the New Zealand National Minimum Dataset, the Geelong Hospital patient management system in Australia, and purpose-built surgical databases in Pietermaritzburg, South Africa, and Port Moresby, Papua New Guinea. For each site, we calculated the POMR overall as well as for nonemergency and emergency admissions. We assessed the effect of admission episodes and procedures as the denominator and the difference between in-hospital POMR and POMR, including postdischarge deaths up to 30 days. To determine the need for risk-adjustment for age and admission urgency, we used univariate and multivariate logistic regression to assess the effect on relative POMR for each site. RESULTS: A total of 1,362,635 patient admissions involving 1,514,242 procedures were included. More than 60% of admissions in Pietermaritzburg and Port Moresby were emergencies, compared with less than 30% in New Zealand and Geelong. Also, Pietermaritzburg and Port Moresby had much younger patient populations (P < .001). A total of 8,655 deaths were recorded within 30 days, and 8-20% of in-hospital deaths occurred on the same day as the first operation. In-hospital POMR ranged approximately 9-fold, from 0.38 per 100 admissions in New Zealand to 3.44 per 100 admissions in Pietermaritzburg. In New Zealand, in-hospital 30-day POMR underestimated total 30-day POMR by approximately one third. The difference in POMR if procedures were used instead of admission episodes ranged from 7 to 70%, although this difference was less when central line and pacemaker insertions were excluded. Age older than 65 years and emergency admission had large, independent effects on POMR but relatively little effect in multivariate analysis on the relative odds of in-hospital death at each site. CONCLUSION: It is possible to collect POMR in countries at all level of development. Although age and admission urgency are strong, independent associations with POMR, a substantial amount of its variance is site-specific and may reflect the safety of operative and anesthetic facilities and processes. Risk-adjustment is desirable but not essential for monitoring system performance. POMR varies depending on the choice of denominator, and in-hospital deaths appear to underestimate 30-day mortality by up to one third. Standardized approaches to reporting and analysis will strengthen the validity of POMR as the principal indicator of the safety of surgery and anesthesia care.


Although random control trial is the gold standard in medical research, researchers are increasingly looking to alternative data sources for hypothesis generation and early-stage evidence collection. Coded clinical data are collected routinely in most hospitals. While they contain rich information directly related to the real clinical setting, they are both noisy and semantically diverse, making them difficult to analyze with conventional statistical tools. This paper presents a novel application of Bayesian nonparametric modeling to uncover latent information in coded clinical data. For a patient cohort, a Bayesian nonparametric model is used to reveal the common comorbidity groups shared by the patients and the proportion that each comorbidity group is reflected individual patient. To demonstrate the method, we present a case study based on hospitalization coding from an Australian hospital. The model recovered 15 comorbidity groups among 1012 patients hospitalized during a month. When patients from two areas of unequal socio-economic status were compared, it reveals higher prevalence of diverticular disease in the region of lower socio-economic status. The study builds a convincing case for routine coded data to speed up hypothesis generation.


Tagging recommender systems provide users the freedom to explore tags and obtain recommendations. The releasing and sharing of these tagging datasets will accelerate both commercial and research work on recommender systems. However, releasing the original tagging datasets is usually confronted with serious privacy concerns, because adversaries may re-identify a user and her/his sensitive information from tagging datasets with only a little background information. Recently, several privacy techniques have been proposed to address the problem, but most of these lack a strict privacy notion, and rarely prevent individuals being re-identified from the dataset. This paper proposes a privacy- preserving tag release algorithm, PriTop. This algorithm is designed to satisfy differential privacy, a strict privacy notion with the goal of protecting users in a tagging dataset. The proposed PriTop algorithm includes three privacy-preserving operations: Private topic model generation structures the uncontrolled tags; private weight perturbation adds Laplace noise into the weights to hide the numbers of tags; while private tag selection finally finds the most suitable replacement tags for the original tags, so the exact tags can be hidden. We present extensive experimental results on four real-world datasets, Delicious, MovieLens, Last.fm and BibSonomy. While the recommendation algorithm is successful in all the cases, our results further suggest the proposed PriTop algorithm can successfully retain the utility of the datasets while preserving privacy.


Database query verification schemes provide correctness guarantees for database queries. Typically such guarantees are required and advisable where queries are executed on untrusted servers. This need to verify query results, even though they may have been executed on one’s own database, is something new that has arisen with the advent of cloud services. The traditional model of hosting one’s own databases on one’s own servers did not require such verification because the hardware and software were both entirely within one’s control, and therefore fully trusted. However, with the economical and technological benefits of cloud services beckoning, many are now considering outsourcing both data and execution of database queries to the cloud, despite obvious risks. This survey paper provides an overview into the field of database query verification and explores the current state of the art in terms of query execution and correctness guarantees provided for query results. We also provide indications towards future work in the area.


Database query verification schemes attempt to provide authenticity, completeness, and freshness guarantees for queries executed on untrusted cloud servers. A number of such schemes currently exist in the literature, allowing query verification for queries that are based on matching whole values (such as numbers, dates, etc.) or for queries based on keyword matching. However, there is a notable gap in the research with regard to query verification schemes for pattern-matching queries. Our contribution here is to provide such a verification scheme that provides correctness guarantees for pattern-matching queries executed on the cloud. We describe a trivial scheme, ȃŸż and show how it does not provide completeness guarantees, and then proceed to describe our scheme based on efficient primitives such as cryptographic hashing and Merkle hash trees along with suffix arrays. We also provide experimental results based on a working prototype to show the practicality of our scheme.Ÿż


Notwithstanding recent work which has demonstrated the potential of using Twitter messages for content-specific data mining and analysis, the depth of such analysis is inherently limited by the scarcity of data imposed by the 140 character tweet limit. In this paper we describe a novel approach for targeted knowledge exploration which uses tweet content analysis as a preliminary step. This step is used to bootstrap more sophisticated data collection from directly related but much richer content sources. In particular we demonstrate that valuable information can be collected by following URLs included in tweets. We automatically extract content from the corresponding web pages and treating each web page as a document linked to the original tweet show how a temporal topic model based on a hierarchical Dirichlet process can be used to track the evolution of a complex topic structure of a Twitter community. Using autism-related tweets we demonstrate that our method is capable of capturing a much more meaningful picture of information exchange than user-chosen hashtags.


Probabilistic topic models have become a standard in modern machine learning to deal with a wide range of applications. Representing data by dimensional reduction of mixture proportion extracted from topic models is not only richer in semantics interpretation, but could also be informative for classification tasks. In this paper, we describe the Topic Model Kernel (TMK), a topicbased kernel for Support Vector Machine classification on data being processed by probabilistic topic models. The applicability of our proposed kernel is demonstrated in several classification tasks with real world datasets. TMK outperforms existing kernels on the distributional features and give comparative results on nonprobabilistic data types.


Este artigo analisa a importância dos atrasos portuários para a competitividade da indústria de transformação no Brasil. Com base em estimativas recentes sobre o custo diário dos atrasos comerciais e em bases de dados do Banco Mundial e do GTAP (Global Trade Analysis Project), revela a magnitude destas barreiras sob a forma de seus equivalentes ad valorem. Em seguida, por meio de simulações em equilíbrio geral, estima o impacto da melhoria dos processos aduaneiros sobre o desempenho da indústria de transformação no Brasil, sob diferentes cenários. Os resultados obtidos ressaltam o caráter estratégico da facilitação do comércio para o Brasil e da sua inclusão como item relevante para a agenda de crescimento de longo prazo do país


O debate a respeito da maioridade penal no Congresso Nacional vai além da discussão da Proposta de Emenda Constitucional nº 171 de 1993, uma vez que a conjuntura social e política, somada ao contexto deliberativo da proposta na Câmara dos Deputados, reacendeu o diálogo na sociedade. Diante disso, realiza-se um mapeamento da discussão da maioridade penal para se saber a extensão da disputa no Congresso Nacional. Para tanto, foram coletados dados e informações disponibilizados pela Câmara dos Deputados e pelo Senado Federal, identificando-se as propostas de emenda constitucional e os projetos de lei que abordam a questão da diminuição da idade mínima para a responsabilização penal. Ademais, a partir dessa seleção de propostas e projetos legislativos foi possível identificar aspectos pertinentes acerca do debate, como o posicionamento dos parlamentares e dos partidos políticos, os argumentos trazidos por ambas as partes da discussão e os interesses que poderão ser atingidos pela redução da idade penal. Além disso, é feito um breve estudo relativo ao Estatuto da Criança e do Adolescente e à contextualização da criminalidade juvenil no Brasil, visando facilitar o dimensionamento da discussão que se desenvolve no Congresso a respeito da inimputabilidade penal das crianças e dos adolescentes garantidos pela Constituição Federal e pelo Estatuto da Criança e do Adolescente. Questões importantes, como a Doutrina da Proteção Integral e os fatores que explicam a criminalidade juvenil, são abrangidos para dar suporte ao mapeamento e à coerência do presente trabalho.


With the constant grow of enterprises and the need to share information across departments and business areas becomes more critical, companies are turning to integration to provide a method for interconnecting heterogeneous, distributed and autonomous systems. Whether the sales application needs to interface with the inventory application, the procurement application connect to an auction site, it seems that any application can be made better by integrating it with other applications. Integration between applications can face several troublesome due the fact that applications may not have been designed and implemented having integration in mind. Regarding to integration issues, two tier software systems, composed by the database tier and by the “front-end” tier (interface), have shown some limitations. As a solution to overcome the two tier limitations, three tier systems were proposed in the literature. Thus, by adding a middle-tier (referred as middleware) between the database tier and the “front-end” tier (or simply referred application), three main benefits emerge. The first benefit is related with the fact that the division of software systems in three tiers enables increased integration capabilities with other systems. The second benefit is related with the fact that any modifications to the individual tiers may be carried out without necessarily affecting the other tiers and integrated systems and the third benefit, consequence of the others, is related with less maintenance tasks in software system and in all integrated systems. Concerning software development in three tiers, this dissertation focus on two emerging technologies, Semantic Web and Service Oriented Architecture, combined with middleware. These two technologies blended with middleware, which resulted in the development of Swoat framework (Service and Semantic Web Oriented ArchiTecture), lead to the following four synergic advantages: (1) allow the creation of loosely-coupled systems, decoupling the database from “front-end” tiers, therefore reducing maintenance; (2) the database schema is transparent to “front-end” tiers which are aware of the information model (or domain model) that describes what data is accessible; (3) integration with other heterogeneous systems is allowed by providing services provided by the middleware; (4) the service request by the “frontend” tier focus on ‘what’ data and not on ‘where’ and ‘how’ related issues, reducing this way the application development time by developers.