43 resultados para 080109 Pattern Recognition and Data Mining
Resumo:
An assessment of the changes in the distribution and extent of mangroves within Moreton Bay, southeast Queensland, Australia, was carried out. Two assessment methods were evaluated: spatial and temporal pattern metrics analysis, and change detection analysis. Currently, about 15,000 ha of mangroves are present in Moreton Bay. These mangroves are important ecosystems, but are subject to disturbance from a number of sources. Over the past 25 years, there has been a loss of more than 3800 ha, as a result of natural losses and mangrove clearing (e.g. for urban and industrial development, agriculture and aquaculture). However, areas of new mangroves have become established over the same time period, offsetting these losses to create a net loss of about 200 ha. These new mangroves have mainly appeared in the southern bay region and the bay islands, particularly on the landward edge of existing mangroves. In addition, spatial patterns and species composition of mangrove patches have changed. The pattern metrics analysis provided an overview of mangrove distribution and change in the form of single metric values, while the change detection analysis gave a more detailed and spatially explicit description of change. An analysis of the effects of spatial scales on the pattern metrics indicated that they were relatively insensitive to scale at spatial resolutions less than 50 m, but that most metrics became sensitive at coarser resolutions, a finding which has implications for mapping of mangroves based on remotely sensed data. (C) 2003 Elsevier Science B.V. All rights reserved.
Resumo:
This paper proposes a novel application of fuzzy logic to web data mining for two basic problems of a website: popularity and satisfaction. Popularity means that people will visit the website while satisfaction refers to the usefulness of the site. We will illustrate that the popularity of a website is a fuzzy logic problem. It is an important characteristic of a website in order to survive in Internet commerce. The satisfaction of a website is also a fuzzy logic problem that represents the degree of success in the application of information technology to the business. We propose a framework of fuzzy logic for the representation of these two problems based on web data mining techniques to fuzzify the attributes of a website.
Resumo:
Electricity market price forecast is a changeling yet very important task for electricity market managers and participants. Due to the complexity and uncertainties in the power grid, electricity prices are highly volatile and normally carry with spikes. which may be (ens or even hundreds of times higher than the normal price. Such electricity spikes are very difficult to be predicted. So far. most of the research on electricity price forecast is based on the normal range electricity prices. This paper proposes a data mining based electricity price forecast framework, which can predict the normal price as well as the price spikes. The normal price can be, predicted by a previously proposed wavelet and neural network based forecast model, while the spikes are forecasted based on a data mining approach. This paper focuses on the spike prediction and explores the reasons for price spikes based on the measurement of a proposed composite supply-demand balance index (SDI) and relative demand index (RDI). These indices are able to reflect the relationship among electricity demand, electricity supply and electricity reserve capacity. The proposed model is based on a mining database including market clearing price, trading hour. electricity), demand, electricity supply and reserve. Bayesian classification and similarity searching techniques are used to mine the database to find out the internal relationships between electricity price spikes and these proposed. The mining results are used to form the price spike forecast model. This proposed model is able to generate forecasted price spike, level of spike and associated forecast confidence level. The model is tested with the Queensland electricity market data with promising results. Crown Copyright (C) 2004 Published by Elsevier B.V. All rights reserved.
Resumo:
This special issue is a collection of the selected papers published on the proceedings of the First International Conference on Advanced Data Mining and Applications (ADMA) held in Wuhan, China in 2005. The articles focus on the innovative applications of data mining approaches to the problems that involve large data sets, incomplete and noise data, or demand optimal solutions.
Resumo:
Objective: An estimation of cut-off points for the diagnosis of diabetes mellitus (DM) based on individual risk factors. Methods: A subset of the 1991 Oman National Diabetes Survey is used, including all patients with a 2h post glucose load >= 200 mg/dl (278 subjects) and a control group of 286 subjects. All subjects previously diagnosed as diabetic and all subjects with missing data values were excluded. The data set was analyzed by use of the SPSS Clementine data mining system. Decision Tree Learners (C5 and CART) and a method for mining association rules (the GRI algorithm) are used. The fasting plasma glucose (FPG), age, sex, family history of diabetes and body mass index (BMI) are input risk factors (independent variables), while diabetes onset (the 2h post glucose load >= 200 mg/dl) is the output (dependent variable). All three techniques used were tested by use of crossvalidation (89.8%). Results: Rules produced for diabetes diagnosis are: A- GRI algorithm (1) FPG>=108.9 mg/dl, (2) FPG>=107.1 and age>39.5 years. B- CART decision trees: FPG >=110.7 mg/dl. C- The C5 decision tree learner: (1) FPG>=95.5 and 54, (2) FPG>=106 and 25.2 kg/m2. (3) FPG>=106 and =133 mg/dl. The three techniques produced rules which cover a significant number of cases (82%), with confidence between 74 and 100%. Conclusion: Our approach supports the suggestion that the present cut-off value of fasting plasma glucose (126 mg/dl) for the diagnosis of diabetes mellitus needs revision, and the individual risk factors such as age and BMI should be considered in defining the new cut-off value.
Resumo:
Over recent years databases have become an extremely important resource for biomedical research. Immunology research is increasingly dependent on access to extensive biological databases to extract existing information, plan experiments, and analyse experimental results. This review describes 15 immunological databases that have appeared over the last 30 years. In addition, important issues regarding database design and the potential for misuse of information contained within these databases are discussed. Access pointers are provided for the major immunological databases and also for a number of other immunological resources accessible over the World Wide Web (WWW). (C) 2000 Elsevier Science B.V. All rights reserved.
Resumo:
Objective: To determine whether coinfection with sexually transmitted diseases (STD) increases HIV shedding in genital-tract secretions, and whether STD treatment reduces this shedding. Design: Systematic review and data synthesis of cross-sectional and cohort studies meeting. predefined quality criteria. Main Outcome Measures: Proportion of patients with and without a STD who had detectable HIV in genital secretions, HIV toad in genital secretions, or change following STD treatment. Results: Of 48 identified studies, three cross-sectional and three cohort studies were included. HIV was detected significantly more frequently in participants infected with Neisseria gonorrhoeae (125 of 309 participants, 41%) than in those without N gonorrhoeae infection (311 of 988 participants, 32%; P = 0.004). HIV was not significantly more frequently detected in persons infected with Chlamydia trachomatis (28 of 67 participants, 42%) than in those without C trachomatis infection (375 of 1149 participants, 33%; P = 0.13). Median HIV load reported in only one study was greater in men with urethritis (12.4 x 10(4) versus 1.51 x 10(4) copies/ml; P = 0.04). In the only cohort study in which this could be fully assessed, treatment of women with any STD reduced the proportion of those with detectable HIV from 39% to 29% (P = 0.05), whereas this proportion remained stable among controls (15-17%), A second cohort study reported fully on HIV load; among men with urethritis, viral load fell from 12.4 to 4.12 x 10(4) copies/ml 2 weeks posttreatment, whereas viral load remained stable in those without urethritis. Conclusion: Few high-quality studies were found. HIV is detected moderately more frequently in genital secretions of men and women with a STD, and HIV load is substantially increased among men with urethritis, Successful STD treatment reduces both of these parameters, but not to control levels. More high-quality studies are needed to explore this important relationship further.
Resumo:
Research in conditioning (all the processes of preparation for competition) has used group research designs, where multiple athletes are observed at one or more points in time. However, empirical reports of large inter-individual differences in response to conditioning regimens suggest that applied conditioning research would greatly benefit from single-subject research designs. Single-subject research designs allow us to find out the extent to which a specific conditioning regimen works for a specific athlete, as opposed to the average athlete, who is the focal point of group research designs. The aim of the following review is to outline the strategies and procedures of single-subject research as they pertain to.. the assessment of conditioning for individual athletes. The four main experimental designs in single-subject research are: the AB design, reversal (withdrawal) designs and their extensions, multiple baseline designs and alternating treatment designs. Visual and statistical analyses commonly used to analyse single-subject data, and advantages and limitations are discussed. Modelling of multivariate single-subject data using techniques such as dynamic factor analysis and structural equation modelling may identify individualised models of conditioning leading to better prediction of performance. Despite problems associated with data analyses in single-subject research (e.g. serial dependency), sports scientists should use single-subject research designs in applied conditioning research to understand how well an intervention (e.g. a training method) works and to predict performance for a particular athlete.
Resumo:
The aim in the current study was to investigate the emergence of pretend play, mirror self-recognition, synchronic imitation and deferred imitation in normally developing human infants. A longitudinal study was conducted with 98 infants seen at three-monthly intervals from 12 through to 24 months of age. At each session the infants were tested on a range of tasks assessing the four target skills. Deferred imitation was found to emerge prior to synchronic imitation, pretend play and mirror self-recognition. In contrast, the latter three skills emerged between 18 and 21 months and followed similar developmental trajectories. Deferred imitation was found to hold a prerequisite relation with these three skills. Synchronic imitation, pretend play and mirror self-recognition were not closely associated and no prerequisite relations were found between these skills. These findings are discussed in the context of current theories regarding the development of pretend play, mirror self-recognition, synchronic imitation and deferred imitation in the second year. (C) 2004 Elsevier Inc. All rights reserved.
Resumo:
Although smoking is widely recognized as a major cause of cancer, there is little information on how it contributes to the global and regional burden of cancers in combination with other risk factors that affect background cancer mortality patterns. We used data from the American Cancer Society's Cancer Prevention Study II (CPS-II) and the WHO and IARC cancer mortality databases to estimate deaths from 8 clusters of site-specific cancers caused by smoking, for 14 epidemiologic subregions of the world, by age and sex. We used lung cancer mortality as an indirect marker for accumulated smoking hazard. CPS-II hazards were adjusted for important covariates. In the year 2000, an estimated 1.42 (95% CI 1.27-1.57) million cancer deaths in the world, 21% of total global cancer deaths, were caused by smoking. Of these, 1.18 million deaths were among men and 0.24 million among women; 625,000 (95% CI 485,000-749,000) smoking-caused cancer deaths occurred in the developing world and 794,000 (95% CI 749,000-840,000) in industrialized regions. Lung cancer accounted for 60% of smoking-attributable cancer mortality, followed by cancers of the upper aerodigestive tract (20%). Based on available data, more than one in every 5 cancer deaths in the world in the year 2000 were caused by smoking, making it possibly the single largest preventable cause of cancer mortality. There was significant variability across regions in the role of smoking as a cause of the different site-specific cancers. This variability illustrates the importance of coupling research and surveillance of smoking with that for other risk factors for more effective cancer prevention. (C) 2005 Wiley-Liss, Inc.
Resumo:
Networked information and communication technologies are rapidly advancing the capacities of governments to target and separately manage specific sub-populations, groups and individuals. Targeting uses data profiling to calculate the differential probabilities of outcomes associated with various personal characteristics. This knowledge is used to classify and sort people for differentiated levels of treatment. Targeting is often used to efficiently and effectively target government resources to the most disadvantaged. Although having many benefits, targeting raises several policy and ethical issues. This paper discusses these issues and the policy responses governments may take to maximise the benefits of targeting while ameliorating the negative aspects.