8 resultados para binary data

em Deakin Research Online - Australia


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite the growing interest in dietary patterns, there have been few longitudinal investigations. The objective of the present study was to extend an earlier method of dietary pattern assessment to longitudinal binary data and to assess changes in patterns over time and in relation to socio-demographic covariates. A prospective national cohort of 1265 participants completed a 5 d food diary at three time-points during their adult life (at age 36 years in 1982, 43 years in 1989 and 53 years in 1999). Factor analysis identified three dietary patterns for women (fruit, vegetables and dairy; ethnic foods and alcohol; meat, potatoes and sweet foods) and two patterns in men (ethnic foods and alcohol; mixed). Trends in dietary pattern scores were calculated using random effects models. Marked changes were found in scores for all patterns between 1989 and 1999, with only the meat, potatoes and sweet foods pattern in women recording a decline. In a multiple variable model that included the three time-points, socio-demographic variables and BMI time-dependent covariates, both non-manual social class and higher education level were also strongly associated with the consumption of more items from the ethnic foods and alcohol pattern and the mixed pattern for men (P<0[middle dot]0001) and the fruit, vegetables and dairy pattern and the ethnic foods and alcohol pattern for women (P<0[middle dot]01). In conclusion, longitudinal changes in dietary patterns and across socio-economic groups can assist with targeting public health initiatives by identifying stages during adult life when interventions to improve diet would be most beneficial to health.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Regular expressions are used to parse textual data to match patterns and extract variables. They have been implemented in a vast number of programming languages with a significant quantity of research devoted to improving their operational efficiency. However, regular expressions are limited to finding linear matches. Little research has been done in the field of object-oriented results which would allow textual or binary data to be converted to multi-layered objects. This is significantly relevant as many of todaypsilas data formats are object-based. This paper extends our previous work by detailing an algorithmic approach to perform object-oriented parsing, and provides an initial study of benchmarks of the algorithms of our contribution

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background: Feature selection techniques are critical to the analysis of high dimensional datasets. This is especially true in gene selection from microarray data which are commonly with extremely high feature-to-sample ratio. In addition to the essential objectives such as to reduce data noise, to reduce data redundancy, to improve sample classification accuracy, and to improve model generalization property, feature selection also helps biologists to focus on the selected genes to further validate their biological hypotheses.
Results: In this paper we describe an improved hybrid system for gene selection. It is based on a recently proposed genetic ensemble (GE) system. To enhance the generalization property of the selected genes or gene subsets and to overcome the overfitting problem of the GE system, we devised a mapping strategy to fuse the goodness information of each gene provided by multiple filtering algorithms. This information is then used for initialization and mutation operation of the genetic ensemble system.
Conclusion: We used four benchmark microarray datasets (including both binary-class and multi-class classification problems) for concept proving and model evaluation. The experimental results indicate that the proposed multi-filter enhanced genetic ensemble (MF-GE) system is able to improve sample classification accuracy, generate more compact gene subset, and converge to the selection results more quickly. The MF-GE system is very flexible as various combinations of multiple filters and classifiers can be incorporated based on the data characteristics and the user preferences.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mg-Zn binary alloys with concentrations between 0 and 2.8wt% Zn have been prepared and processed via hot rolling and annealing to produce specimens with a strong basal texture and a range of grain sizes. These have been deformed in tension, a condition in which the deformation is dominated by prismatic slip. This data has been used to assess the Hall-Petch parameter as a function of Zn concentration for deformation dominated by prismatic slip. Pure magnesium showed non-linear Hall-Petch behaviour at large grain sizes, and this is compared to the values for prismatic slip measured on single crystals. The differences between critical resolved shear stress measurements made through single crystal, polycrystal and mathematical modelling techniques are also discussed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Analysis and fusion of social measurements is important to understand what shapes the public’s opinion and the sustainability of the global development. However, modeling data collected from social responses is challenging as the data is typically complex and heterogeneous, which might take the form of stated facts, subjective assessment, choices, preferences or any combination thereof. Model-wise, these responses are a mixture of data types including binary, categorical, multicategorical, continuous, ordinal, count and rank data. The challenge is therefore to effectively handle mixed data in the a unified fusion framework in order to perform inference and analysis. To that end, this paper introduces eRBM (Embedded Restricted Boltzmann Machine) – a probabilistic latent variable model that can represent mixed data using a layer of hidden variables transparent across different types of data. The proposed model can comfortably support largescale data analysis tasks, including distribution modelling, data completion, prediction and visualisation. We demonstrate these versatile features on several moderate and large-scale publicly available social survey datasets.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Autism Spectrum Disorder (ASD) is growing at a staggering rate, but, little is known about the cause of this condition. Inferring learning patterns from therapeutic performance data, and subsequently clustering ASD children into subgroups, is important to understand this domain, and more importantly to inform evidence-based intervention. However, this data-driven task was difficult in the past due to insufficiency of data to perform reliable analysis. For the first time, using data from a recent application for early intervention in autism (TOBY Play pad), whose download count is now exceeding 4500, we present in this paper the automatic discovery of learning patterns across 32 skills in sensory, imitation and language. We use unsupervised learning methods for this task, but a notorious problem with existing methods is the correct specification of number of patterns in advance, which in our case is even more difficult due to complexity of the data. To this end, we appeal to recent Bayesian nonparametric methods, in particular the use of Bayesian Nonparametric Factor Analysis. This model uses Indian Buffet Process (IBP) as prior on a binary matrix of infinite columns to allocate groups of intervention skills to children. The optimal number of learning patterns as well as subgroup assignments are inferred automatically from data. Our experimental results follow an exploratory approach, present different newly discovered learning patterns. To provide quantitative results, we also report the clustering evaluation against K-means and Nonnegative matrix factorization (NMF). In addition to the novelty of this new problem, we were able to demonstrate the suitability of Bayesian nonparametric models over parametric rivals.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Malware replicates itself and produces offspring with the same characteristics but different signatures by using code obfuscation techniques. Current generation anti-virus engines employ a signature-template type detection approach where malware can easily evade existing signatures in the database. This reduces the capability of current anti-virus engines in detecting malware. In this paper, we propose a stepwise binary logistic regression-based dimensionality reduction techniques for malware detection using application program interface (API) call statistics. Finding the most significant malware feature using traditional wrapper-based approaches takes an exponential complexity of the dimension (m) of the dataset with a brute-force search strategies and order of (m-1) complexity with a backward elimination filter heuristics. The novelty of the proposed approach is that it finds the worst case computational complexity which is less than order of (m-1). The proposed approach uses multi-linear regression and the p-value of each individual API feature for selection of the most uncorrelated and significant features in order to reduce the dimensionality of the large malware data and to ensure the absence of multi-collinearity. The stepwise logistic regression approach is then employed to test the significance of the individual malware feature based on their corresponding Wald statistic and to construct the binary decision the model. When the selected most significant APIs are used in a decision rule generation systems, this approach not only reduces the tree size but also improves classification performance. Exhaustive experiments on a large malware data set show that the proposed approach clearly exceeds the existing standard decision rule, support vector machine-based template approach with complete data and provides a better statistical fitness.