960 resultados para Semi-supervised clustering
Resumo:
Vehicular traffic in urban areas may adversely affect urban water quality through the build-up of traffic generated semi and non volatile organic compounds (SVOCs and NVOCs) on road surfaces. The characterisation of the build-up processes is the key to developing mitigation measures for the removal of such pollutants from urban stormwater. An in-depth analysis of the build-up of SVOCs and NVOCs was undertaken in the Gold Coast region in Australia. Principal Component Analysis (PCA) and Multicriteria Decision tools such as PROMETHEE and GAIA were employed to understand the SVOC and NVOC build-up under combined traffic scenarios of low, moderate, and high traffic in different land uses. It was found that congestion in the commercial areas and use of lubricants and motor oils in the industrial areas were the main sources of SVOCs and NVOCs on urban roads, respectively. The contribution from residential areas to the build-up of such pollutants was hardly noticeable. It was also revealed through this investigation that the target SVOCs and NVOCs were mainly attached to particulate fractions of 75 to 300 µm whilst the redistribution of coarse fractions due to vehicle activity mainly occurred in the >300 µm size range. Lastly, under combined traffic scenario, moderate traffic with average daily traffic ranging from 2300 to 5900 and average congestion of 0.47 was found to dominate SVOC and NVOC build-up on roads.
Resumo:
Background: Waist circumference has been identified as a valuable predictor of cardiovascular risk in children. The development of waist circumference percentiles and cut-offs for various ethnic groups are necessary because of differences in body composition. The purpose of this study was to develop waist circumference percentiles for Chinese children and to explore optimal waist circumference cut-off values for predicting cardiovascular risk factors clustering in this population.----- ----- Methods: Height, weight, and waist circumference were measured in 5529 children (2830 boys and 2699 girls) aged 6-12 years randomly selected from southern and northern China. Blood pressure, fasting triglycerides, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and glucose were obtained in a subsample (n = 1845). Smoothed percentile curves were produced using the LMS method. Receiver-operating characteristic analysis was used to derive the optimal age- and gender-specific waist circumference thresholds for predicting the clustering of cardiovascular risk factors.----- ----- Results: Gender-specific waist circumference percentiles were constructed. The waist circumference thresholds were at the 90th and 84th percentiles for Chinese boys and girls respectively, with sensitivity and specificity ranging from 67% to 83%. The odds ratio of a clustering of cardiovascular risk factors among boys and girls with a higher value than cut-off points was 10.349 (95% confidence interval 4.466 to 23.979) and 8.084 (95% confidence interval 3.147 to 20.767) compared with their counterparts.----- ----- Conclusions: Percentile curves for waist circumference of Chinese children are provided. The cut-off point for waist circumference to predict cardiovascular risk factors clustering is at the 90th and 84th percentiles for Chinese boys and girls, respectively.
Resumo:
The functional properties of cartilaginous tissues are determined predominantly by the content, distribution, and organization of proteoglycan and collagen in the extracellular matrix. Extracellular matrix accumulates in tissue-engineered cartilage constructs by metabolism and transport of matrix molecules, processes that are modulated by physical and chemical factors. Constructs incubated under free-swelling conditions with freely permeable or highly permeable membranes exhibit symmetric surface regions of soft tissue. The variation in tissue properties with depth from the surfaces suggests the hypothesis that the transport processes mediated by the boundary conditions govern the distribution of proteoglycan in such constructs. A continuum model (DiMicco and Sah in Transport Porus Med 50:57-73, 2003) was extended to test the effects of membrane permeability and perfusion on proteoglycan accumulation in tissue-engineered cartilage. The concentrations of soluble, bound, and degraded proteoglycan were analyzed as functions of time, space, and non-dimensional parameters for several experimental configurations. The results of the model suggest that the boundary condition at the membrane surface and the rate of perfusion, described by non-dimensional parameters, are important determinants of the pattern of proteoglycan accumulation. With perfusion, the proteoglycan profile is skewed, and decreases or increases in magnitude depending on the level of flow-based stimulation. Utilization of a semi-permeable membrane with or without unidirectional flow may lead to tissues with depth-increasing proteoglycan content, resembling native articular cartilage.
Resumo:
The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information.
Resumo:
Due to the limitation of current condition monitoring technologies, the estimates of asset health states may contain some uncertainties. A maintenance strategy ignoring this uncertainty of asset health state can cause additional costs or downtime. The partially observable Markov decision process (POMDP) is a commonly used approach to derive optimal maintenance strategies when asset health inspections are imperfect. However, existing applications of the POMDP to maintenance decision-making largely adopt the discrete time and state assumptions. The discrete-time assumption requires the health state transitions and maintenance activities only happen at discrete epochs, which cannot model the failure time accurately and is not cost-effective. The discrete health state assumption, on the other hand, may not be elaborate enough to improve the effectiveness of maintenance. To address these limitations, this paper proposes a continuous state partially observable semi-Markov decision process (POSMDP). An algorithm that combines the Monte Carlo-based density projection method and the policy iteration is developed to solve the POSMDP. Different types of maintenance activities (i.e., inspections, replacement, and imperfect maintenance) are considered in this paper. The next maintenance action and the corresponding waiting durations are optimized jointly to minimize the long-run expected cost per unit time and availability. The result of simulation studies shows that the proposed maintenance optimization approach is more cost-effective than maintenance strategies derived by another two approximate methods, when regular inspection intervals are adopted. The simulation study also shows that the maintenance cost can be further reduced by developing maintenance strategies with state-dependent maintenance intervals using the POSMDP. In addition, during the simulation studies the proposed POSMDP shows the ability to adopt a cost-effective strategy structure when multiple types of maintenance activities are involved.
Resumo:
The topic of fault detection and diagnostics (FDD) is studied from the perspective of proactive testing. Unlike most research focus in the diagnosis area in which system outputs are analyzed for diagnosis purposes, in this paper the focus is on the other side of the problem: manipulating system inputs for better diagnosis reasoning. In other words, the question of how diagnostic mechanisms can direct system inputs for better diagnosis analysis is addressed here. It is shown how the problem can be formulated as decision making problem coupled with a Bayesian Network based diagnostic mechanism. The developed mechanism is applied to the problem of supervised testing in HVAC systems.
Resumo:
Kernel-based learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the so-called kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input space -- classical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semi-definite programming (SDP) techniques. When applied to a kernel matrix associated with both training and test data this gives a powerful transductive algorithm -- using the labelled part of the data one can learn an embedding also for the unlabelled part. The similarity between test points is inferred from training points and their labels. Importantly, these learning problems are convex, so we obtain a method for learning both the model class and the function without local minima. Furthermore, this approach leads directly to a convex method to learn the 2-norm soft margin parameter in support vector machines, solving another important open problem. Finally, the novel approach presented in the paper is supported by positive empirical results.
Resumo:
Background Colorectal cancer (CRC) diagnosis and the ensuing treatments can have a substantial impact on the physical and psychological health of survivors. As the number of CRC survivors increases, so too does the need to develop viable rehabilitation programs to help these survivors return to good health as quickly as possible. Exercise has the potential to address many of the adverse effects of CRC treatment; however, to date, the role of exercise in the rehabilitation of cancer patients immediately after the completion of treatment has received limited research attention. This paper presents the design of a randomised controlled trial which will evaluate the feasibility and efficacy of a 12-week supervised aerobic exercise program (ImPACT Program) on the physiological and psychological markers of rehabilitation, in addition to biomarkers of standard haematological outcomes and the IGF axis. Methods/Design Forty CRC patients will be recruited through oncology clinics and randomised to an exercise group or a usual care control group. Baseline assessment will take place within 4 weeks of the patient completing adjuvant chemotherapy treatment. The exercise program for patients in the intervention group will commence a week after the baseline assessment. The program consists of three supervised moderate-intensity aerobic exercise sessions per week for 12 weeks. All participants will have assessments at baseline (0 wks), mid-intervention (6 wks), post-intervention (12 wks) and at a 6-week follow-up (18 wks). Outcome measures include cardio-respiratory fitness, biomarkers associated with health and survival, and indices of fatigue and quality of life. Process measures are participants' acceptability of, adherence to, and compliance with the exercise program, in addition to the safety of the program. Discussion The results of this study will provide valuable insight into the role of supervised exercise in improving life after CRC. Additionally, process analyses will inform the feasibility of implementing the program in a population of CRC patients immediately after completing chemotherapy.
Resumo:
Single particle analysis (SPA) coupled with high-resolution electron cryo-microscopy is emerging as a powerful technique for the structure determination of membrane protein complexes and soluble macromolecular assemblies. Current estimates suggest that ∼104–105 particle projections are required to attain a 3 Å resolution 3D reconstruction (symmetry dependent). Selecting this number of molecular projections differing in size, shape and symmetry is a rate-limiting step for the automation of 3D image reconstruction. Here, we present SwarmPS, a feature rich GUI based software package to manage large scale, semi-automated particle picking projects. The software provides cross-correlation and edge-detection algorithms. Algorithm-specific parameters are transparently and automatically determined through user interaction with the image, rather than by trial and error. Other features include multiple image handling (∼102), local and global particle selection options, interactive image freezing, automatic particle centering, and full manual override to correct false positives and negatives. SwarmPS is user friendly, flexible, extensible, fast, and capable of exporting boxed out projection images, or particle coordinates, compatible with downstream image processing suites.
Resumo:
This paper proposes the use of eigenvoice modeling techniques with the Cross Likelihood Ratio (CLR) as a criterion for speaker clustering within a speaker diarization system. The CLR has previously been shown to be a robust decision criterion for speaker clustering using Gaussian Mixture Models. Recently, eigenvoice modeling techniques have become increasingly popular, due to its ability to adequately represent a speaker based on sparse training data, as well as an improved capture of differences in speaker characteristics. This paper hence proposes that it would be beneficial to capitalize on the advantages of eigenvoice modeling in a CLR framework. Results obtained on the 2002 Rich Transcription (RT-02) Evaluation dataset show an improved clustering performance, resulting in a 35.1% relative improvement in the overall Diarization Error Rate (DER) compared to the baseline system.
Resumo:
Dealing with product yield and quality in manufacturing industries is getting more difficult due to the increasing volume and complexity of data and quicker time to market expectations. Data mining offers tools for quick discovery of relationships, patterns and knowledge in large databases. Growing self-organizing map (GSOM) is established as an efficient unsupervised datamining algorithm. In this study some modifications to the original GSOM are proposed for manufacturing yield improvement by clustering. These modifications include introduction of a clustering quality measure to evaluate the performance of the programme in separating good and faulty products and a filtering index to reduce noise from the dataset. Results show that the proposed method is able to effectively differentiate good and faulty products. It will help engineers construct the knowledge base to predict product quality automatically from collected data and provide insights for yield improvement.