25 resultados para Structured data

em Deakin Research Online - Australia


Relevância:

80.00% 80.00%

Publicador:

Resumo:

In many business situations, products or user profile data are so complex that they need to be described by use of tree structures. Evaluating the similarity between tree-structured data is essential in many applications, such as recommender systems. To evaluate the similarity between two trees, concept corresponding nodes should be identified by constructing an edit distance mapping between them. Sometimes, the intension of one concept includes the intensions of several other concepts. In that situation, a one-to-many mapping should be constructed from the point of view of structures. This paper proposes a tree similarity measure model that can construct this kind of mapping. The similarity measure model takes into account all the information on nodes’ concepts, weights, and values. The conceptual similarity and the value similarity between two trees are evaluated based on the constructed mapping, and the final similarity measure is assessed as a weighted sum of their conceptual and value similarities. The effectiveness of the proposed similarity measure model is shown by an illustrative example and is also demonstrated by applying it into a recommender system.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper introduces an incremental FP-Growth approach for Web content based data mining and its application in solving a real world problem The problem is solved in the following ways. Firstly, we obtain the semi-structured data from the Web pages of Chinese car market and structure them and save them in local database. Secondly, we use an incremental FP-Growth algorithm for mining association rules to discover Chinese consumers' car consumption preference. To find more general regularities, an attribute-oriented induction method is also utilized to find customer's consumption preference among a range of car categories. Experimental results have revealed some interesting consumption preferences that are useful for the decision makers to make the policy to encourage and guide car consumption. Although the current data we used may not be the best representative of the actual market in practice, it is still good enough for the decision making purpose in terms of reflecting the real situation of car consumption preference under the two assumptions in the context.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Technology has been the catalyst that has facilitated an explosion of organisational data in terms of its velocity, variety, and volume, resulting in a greater depth and breadth of potentially valuable information, previously unutilised. The variety of data accessible to organisations extends beyond traditional structured data to now encompass previously unobtainable and difficult to analyse unstructured data. In addition to exploiting data, organisations are now facing an even greater challenge of assessing data quality and identifying the impacts of lack of quality. The aim of this research is to contribute to data quality literature, focusing on improving a current understanding of business-related Data Quality (DQ) issues facing organisations. This review builds on existing Information Systems literature, and proposes further research in this area. Our findings confirm that the current literature lags in recognising new types of data and imminent DQ impacts facing organisations in today’s dynamic environment of the so-called “Big Data”. Insights clearly identify the need for further research on DQ, in particular in relation to unstructured data. It also raises questions regarding new DQ impacts and implications for organisations, in their quest to leverage the variety of available data types to provide richer insights.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents a real application of Web-content mining using an incremental FP-Growth approach. We firstly restructure the semi-structured data retrieved from the web pages of Chinese car market to fit into the local database, and then employ an incremental algorithm to discover the association rules for the identification of car preference. To find more general regularities, a method of attribute-oriented induction is also utilized to find customer’s consumption preferences. Experimental results show some interesting consumption preference patterns that may be beneficial for the government in making policy to encourage and guide car consumption.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The emergence of new media—including branded websites, social media and mobile applications—has created additional touch points for unhealthy food and beverage companies to target children and adolescents. The aim of this study was to perform an audit of new media for three top selling food and beverage brands in Australia. The top selling brand in three of the most advertised food and beverage categories was identified. Facebook, websites and mobile phone applications from these three brands were assessed using a combination of descriptive analyses and structured data collection during June and July 2013. Information on target audience, main focus of the activity, marketing strategies employed and connectivity were collected. Promotional activities were assessed against industry self-regulatory codes. McDonald's, Coca-Cola and Cadbury Dairy Milk were audited, with 21 promotional activities identified. These promotional activities appeared to use a number of marketing strategies, with frequent use of indirect product association, engagement techniques and branding. We identified strategic targeting of both children and adolescents. We found that while all promotional activities technically met self-regulatory codes (usually due to media-specific age restrictions) a number appeared to employ unhealthy food or beverage marketing directed to children. Brands are using engaging content via new media aimed at children and adolescents to promote unhealthy food and beverages. Given the limitations of self-regulatory codes in the context of new media, strategies need to be developed to reduce exposure of children and adolescents to marketing of unhealthy food and beverage products via these avenues.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This thesis advances several theoretical and practical aspects of the recently introduced restricted Boltzmann machine - a powerful probabilistic and generative framework for modelling data and learning representations. The contributions of this study represent a systematic and common theme in learning structured representations from complex data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The movement of chemicals through the soil to the groundwater or discharged to surface waters represents a degradation of these resources. In many cases, serious human and stock health implications are associated with this form of pollution. The chemicals of interest include nutrients, pesticides, salts, and industrial wastes. Recent studies have shown that current models and methods do not adequately describe the leaching of nutrients through soil, often underestimating the risk of groundwater contamination by surface-applied chemicals and overestimating the concentration of resident solutes. This inaccuracy results primarily from ignoring soil structure and nonequilibrium between soil constituents, water, and solutes. A multiple sample percolation system (MSPS), consisting of 25 individual collection wells, was constructed to study the effects of localized soil heterogeneities on the transport of nutrients (NO−3, Cl−, PO3−4) in the vadose zone of an agricultural soil predominantly dominated by clay. Very significant variations in drainage patterns across a small spatial scale were observed (one-way ANOVA, p < 0.001 indicating considerable heterogeneity in water flow patterns and nutrient leaching. Using data collected from the multiple sample percolation experiments, this paper compares the performance of two mathematical models for predicting solute transport, the advective-dispersion model with a reaction term (ADR), and a two-region preferential flow model (TRM) suitable for modelling nonequilibrium transport. These results have implications for modelling solute transport and predicting nutrient loading on a larger scale.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Consensus guidelines advocate the treatment of constipation and faecal impaction in order to improve symptoms of urinary frequency, urgency and urinary incontinence and to promote bladder emptying in the absence of urinary tract obstruction. This structured review of the literature was undertaken to search for and appraise evidence to support or negate the hypothesis of this relationship. The search strategy was comprehensive and identified six relevant studies. Two of these had been conducted on an adult population and four studies involved children with constipation. These studies were appraised for methodological quality. It was found that sample sizes were small and evidence was inconsistent. Variable methods of reporting meant that data were not able to be pooled for meta-analysis.
Based on the limited and conflicting evidence, it is recommended that further research be undertaken to identify any correlation between bowel and bladder function.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Collecting, analyzing, and making Molecularbiological annotation data accessible in different public data sources is still an ongoing project. Integration of such data from these data sources might lead to valuable biological knowledge. There are numerous annotation data and only some of those are structured. The number and contents of related sources are continuously increasing. In addition, the existing data sources have their own storage structure and implementation. As a result, these could lead to a limitation in the combining of annotation. Here, we proposed a tool, called ANNODA, for integrating Molecular-biological annotation data. Unlike the past work on database interoperation in the bioinformatics community, this database design uses web-links which are very useful for interactive navigation and meanwhile it also supports automated large-scale analysis tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The peer-to-peer content distribution network (PCDN) is a hot topic recently, and it has a huge potential for massive data intensive applications on the Internet. One of the challenges in PCDN is routing for data sources and data deliveries. In this paper, we studied a type of network model which is formed by dynamic autonomy area, structured source servers and proxy servers. Based on this network model, we proposed a number of algorithms to address the routing and data delivery issues. According to the highly dynamics of the autonomy area, we established dynamic tree structure proliferation system routing, proxy routing and resource searching algorithms. The simulations results showed that the performance of the proposed network model and the algorithms are stable.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Purpose – The purpose of this article is to present an empirical analysis of complex sample data with regard to the biasing effect of non-independence of observations on standard error parameter estimates. Using field data structured in the form of repeated measurements it is to be shown, in a two-factor confirmatory factor analysis model, how the bias in SE can be derived when the non-independence is ignored.

Design/methodology/approach – Three estimation procedures are compared: normal asymptotic theory (maximum likelihood); non-parametric standard error estimation (naïve bootstrap); and sandwich (robust covariance matrix) estimation (pseudo-maximum likelihood).

Findings – The study reveals that, when using either normal asymptotic theory or non-parametric standard error estimation, the SE bias produced by the non-independence of observations can be noteworthy.

Research limitations/implications –
Considering the methodological constraints in employing field data, the three analyses examined must be interpreted independently and as a result taxonomic generalisations are limited. However, the study still provides “case study” evidence suggesting the existence of the relationship between non-independence of observations and standard error bias estimates.

Originality/value – Given the increasing popularity of structural equation models in the social sciences and in particular in the marketing discipline, the paper provides a theoretical and practical insight into how to treat repeated measures and clustered data in general, adding to previous methodological research. Some conclusions and suggestions for researchers who make use of partial least squares modelling are also drawn.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Childhood mental health problems are prevalent in Australian children (14–20%). Social exclusion is a risk factor for mental health problems, whereas being socially included can have protective effects. This study aims to identify the barriers to social inclusion for children aged 9–12 years living in low socio-economic status (SES) areas, using both child-report and parent-report interviews.

Methods: Australian-born English-speaking parents and children aged 9–12 years were sampled from a low SES area to participate in semi-structured interviews. Parents and children were asked questions around three prominent themes of social exclusion; exclusion from school, social activities and social networks.

Results: Many children experienced social exclusion at school, from social activities or within social networks. Overall, nine key barriers to social inclusion were identified through parent and child interviews, such as inability to attend school camps and participate in school activities, bullying and being left out, time and transport constraints, financial constraints and safety and traffic concerns. Parents and children often identified different barriers.

Discussion: There are several barriers to social inclusion for children living in low SES communities, many of which can be used to facilitate mental health promotion programmes. Given that parents and children may report different barriers, it is important to seek both perspectives.

Conclusion: This study strengthens the evidence base for the investments and action required to bring about the conditions for social inclusion for children living in low SES communities.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background There is conflicting evidence regarding levels of leptin in depression. In this study we aimed to investigate the relationship between serum leptin level and depression in a community sample of women using both cross-sectional and longitudinal data.

Methods From among 510 women aged 20–78 yr, 83 were identified with a lifetime history of major depressive disorder or dysthymia, ascertained using the Structured Clinical Interview for DSM-IV-TR Research Version, Non-patient edition (SCID-I/NP). Serum leptin levels were measured by radioimmunoassay. Medication use and lifestyle were self-reported and body mass index (BMI) determined from measures of height and weight.

Results Using multiple linear regression, serum leptin levels were greater among women with a lifetime history of depression compared to women without any history of depression, independent of BMI. Adjusted geometric mean values of serum leptin were 16.37 (95%CI 14.70–18.23) ng/mL for depressed and 14.46 (95%CI 13.79–15.16) ng/mL for non-depressed women (P = 0.039). The hazard ratio (HR) for a de novo depressive disorder over five years increased 2.56-fold for each standard deviation increase in log-transformed serum leptin among non-smokers and this was not explained by differences in BMI, medications or other lifestyle factors (HR = 2.56, 95%CI 1.52-4.30). No association was observed for smokers.

Limitations There is potential for unrecognised confounding, recall bias and transient changes in body composition.

Conclusion Women with a lifetime history of depression have elevated levels of serum leptin, and elevated serum leptin predicts subsequent development of a depressive disorder.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This data collection consists of:
• Raw interview data from semi-structured interviews
• Transcriptions of interviews
• Photographs
• Sound files
• Diary data from the learning object development process

Data is drawn from respondents in the Visy Industrial Packaging factory site. The data is themed according to the Deakin graduate attributes.