11 resultados para data science

em Deakin Research Online - Australia


Relevância:

70.00% 70.00%

Publicador:

Resumo:

The development and application of computational data mining techniques in financial fraud detection and business failure prediction has become a popular cross-disciplinary research area in recent times involving financial economists, forensic accountants and computational modellers. Some of the computational techniques popularly used in the context of - financial fraud detection and business failure prediction can also be effectively applied in the detection of fraudulent insurance claims and therefore, can be of immense practical value to the insurance industry. We provide a comparative analysis of prediction performance of a battery of data mining techniques using real-life automotive insurance fraud data. While the data we have used in our paper is US-based, the computational techniques we have tested can be adapted and generally applied to detect similar insurance frauds in other countries as well where an organized automotive insurance industry exists.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This paper introduces a novel method for gene selection based on a modification of analytic hierarchy process (AHP). The modified AHP (MAHP) is able to deal with quantitative factors that are statistics of five individual gene ranking methods: two-sample t-test, entropy test, receiver operating characteristic curve, Wilcoxon test, and signal to noise ratio. The most prominent discriminant genes serve as inputs to a range of classifiers including linear discriminant analysis, k-nearest neighbors, probabilistic neural network, support vector machine, and multilayer perceptron. Gene subsets selected by MAHP are compared with those of four competing approaches: information gain, symmetrical uncertainty, Bhattacharyya distance and ReliefF. Four benchmark microarray datasets: diffuse large B-cell lymphoma, leukemia cancer, prostate and colon are utilized for experiments. As the number of samples in microarray data datasets are limited, the leave one out cross validation strategy is applied rather than the traditional cross validation. Experimental results demonstrate the significant dominance of the proposed MAHP against the competing methods in terms of both accuracy and stability. With a benefit of inexpensive computational cost, MAHP is useful for cancer diagnosis using DNA gene expression profiles in the real clinical practice.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Low cost pervasive electrocardiogram (ECG) monitors is changing how sinus arrhythmia are diagnosed among patients with mild symptoms. With the large amount of data generated from long-term monitoring, come new data science and analytical challenges. Although traditional rule-based detection algorithms still work on relatively short clinical quality ECG, they are not optimal for pervasive signals collected from wearable devices - they don't adapt to individual difference and assume accurate identification of ECG fiducial points. To overcome these short-comings of the rule-based methods, this paper introduces an arrhythmia detection approach for low quality pervasive ECG signals. To achieve the robustness needed, two techniques were applied. First, a set of ECG features with minimal reliance on fiducial point identification were selected. Next, the features were normalized using robust statistics to factors out baseline individual differences and clinically irrelevant temporal drift that is common in pervasive ECG. The proposed method was evaluated using pervasive ECG signals we collected, in combination with clinician validated ECG signals from Physiobank. Empirical evaluation confirms accuracy improvements of the proposed approach over the traditional clinical rules.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In data science, anomaly detection is the process of identifying the items, events or observations which do not conform to expected patterns in a dataset. As widely acknowledged in the computer vision community and security management, discovering suspicious events is the key issue for abnormal detection in video surveil-lance. The important steps in identifying such events include stream data segmentation and hidden patterns discovery. However, the crucial challenge in stream data segmenta-tion and hidden patterns discovery are the number of coherent segments in surveillance stream and the number of traffic patterns are unknown and hard to specify. Therefore, in this paper we revisit the abnormality detection problem through the lens of Bayesian nonparametric (BNP) and develop a novel usage of BNP methods for this problem. In particular, we employ the Infinite Hidden Markov Model and Bayesian Nonparamet-ric Factor Analysis for stream data segmentation and pattern discovery. In addition, we introduce an interactive system allowing users to inspect and browse suspicious events.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Physical Activity is important for maintaining healthy lifestyles. Recommendations for physical activity levels are issued by most governments as part of public health measures. As such, reliable measurement of physical activity for regulatory purposes is vital. This has lead research to explore standards for achieving this using wearable technology and artificial neural networks that produce classifications for specific physical activity events. Applied from a very early age, the ubiquitous capture of physical activity data using mobile and wearable technology may help us to understand how we can combat childhood obesity and the impact that this has in later life. A supervised machine learning approach is adopted in this paper that utilizes data obtained from accelerometer sensors worn by children in free-living environments. The paper presents a set of activities and features suitable for measuring physical activity and evaluates the use of a Multilayer Perceptron neural network to classify physical activities by activity type. A rigorous reproducible data science methodology is presented for subsequent use in physical activity research. Our results show that it was possible to obtain an overall accuracy of 96 % with 95 % for sensitivity, 99 % for specificity and a kappa value of 94 % when three and four feature combinations were used.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aggregation of amyloid-beta (Aβ) peptide is the major event underlying neuronal damage in Alzheimer's disease (AD). Specific lipids and their homeostasis play important roles in this and other neurodegenerative disorders. The complex interplay between the lipids and the generation, clearance or deposition of Aβ has been intensively investigated and is reviewed in this chapter. Membrane lipids can have an important influence on the biogenesis of Aβ from its precursor protein. In particular, increased cholesterol in the plasma membrane augments Aβ generation and shows a strong positive correlation with AD progression. Furthermore, apolipoprotein E, which transports cholesterol in the cerebrospinal fluid and is known to interact with Aβ or compete with it for the lipoprotein receptor binding, significantly influences Aβ clearance in an isoform-specific manner and is the major genetic risk factor for AD. Aβ is an amphiphilic peptide that interacts with various lipids, proteins and their assemblies, which can lead to variation in Aβ aggregation in vitro and in vivo. Upon interaction with the lipid raft components, such as cholesterol, gangliosides and phospholipids, Aβ can aggregate on the cell membrane and thereby disrupt it, perhaps by forming channel-like pores. This leads to perturbed cellular calcium homeostasis, suggesting that Aβ-lipid interactions at the cell membrane probably trigger the neurotoxic cascade in AD. Here, we overview the roles of specific lipids, lipid assemblies and apolipoprotein E in Aβ processing, clearance and aggregation, and discuss the contribution of these factors to the neurotoxicity in AD.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Snapper (Pagrus auratus) is widely distributed throughout subtropical and temperate southern oceans and forms a significant recreational and commercial fishery in Queensland, Australia. Using data from government reports, media sources, popular publications and a government fisheries survey carried out in 1910, we compiled information on individual snapper fishing trips that took place prior to the commencement of fisherywide organized data collection, from 1871 to 1939. In addition to extracting all available quantitative data, we translated qualitative information into bounded estimates and used multiple imputation to handle missing values, forming 287 records for which catch rate (snapper fisher -1 h -1) could be derived. Uncertainty was handled through a parametric maximum likelihood framework (a transformed trivariate Gaussian), which facilitated statistical comparisons between data sources. No statistically significant differences in catch rates were found among media sources and the government fisheries survey. Catch rates remained stable throughout the time series, averaging 3.75 snapper fisher -1 h -1 (95% confidence interval, 3.42–4.09) as the fishery expanded into new grounds. In comparison, a contemporary (1993–2002) south-east Queensland charter fishery produced an average catch rate of 0.4 snapper fisher -1 h -1 (95% confidence interval, 0.31–0.58). These data illustrate the productivity of a fishery during its earliest years of development and represent the earliest catch rate data globally for this species. By adopting a formalized approach to address issues common to many historical records – missing data, a lack of quantitative information and reporting bias – our analysis demonstrates the potential for historical narratives to contribute to contemporary fisheries management.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

In this paper we explore how reanimating a video data sequence with editing and creative software provided an opportunity for the data to speak and to demand new and surprising responses from us. Our data-ing brought new lines and spaces to the fore, through a process of refraction and re-animation which forced a focus on embodied inter-relationships and impeded precipitous analytical thought on the part of the researcher. We note how the aesthetic of the new images evoked awareness of our own part in the production of the object of our research. In particular, our own collegial interchange, punctuated by time and distance due to our respective locations on opposite sides of the globe, opened up a space for data-lingering in the intervening silences and pauses. Our choice of images engenders and reflects our sense of movement between the `I’ and the `we’ in their depiction of students’ learning about space.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

A version of the Course Experience Questionnaire (CEQ) has been included in the Graduate Careers Council of Australia national survey of university graduates from 1993 onward. In addition to the quantitative response items noted above, the CEQ also includes an invitation to respondents to write open-ended comments on the best aspects (BA) of their university course experience and those aspects most needing improvement (NI).