925 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
OBJECTIVE To assess the scientific activity and information production of the journal Nutrición Hospitalaria, for the period 2001-2005 by means of a Bibliometric study. METHOD Cross-sectional descriptive study of the results obtained from the analysis of the articles published in the journal Nutrición Hospitalaria. The data were obtained by consulting the electronic version through the Web. In those cases in which there was a link breakdown, and thus, the inability to have access to the electronic document the printed version was consulted. All the documental possibilities were taken into account with the exception of communications to congresses. RESULTS A total of 345 articles were published, 187 (54.20%) being original articles. The geographical distribution of the first author was Spanish in 287 articles (83.19%) and Latin American in 27 (7.83%). Most of the articles are from health care centers (172 articles (49.86%)), and the cooperation index being 4.15. Madrid is the most productive province, for both the absolute and adjusted frequencies. The median number of references per article is 18, the mean being 23.52 (95% CI 20.93 - 26.10). The predominant language was Spanish, with 308 articles (89.28%). CONCLUSION Nutrición Hospitalaria may be considered as a reference journal regarding information and scientific communication on Nutrition for both the Spanish and Latin American communities. The bibliometric parameters studied compare with those verified for the remaining top of the list Spanish scientific journals on health sciences.
Resumo:
A methodology of exploratory data analysis investigating the phenomenon of orographic precipitation enhancement is proposed. The precipitation observations obtained from three Swiss Doppler weather radars are analysed for the major precipitation event of August 2005 in the Alps. Image processing techniques are used to detect significant precipitation cells/pixels from radar images while filtering out spurious effects due to ground clutter. The contribution of topography to precipitation patterns is described by an extensive set of topographical descriptors computed from the digital elevation model at multiple spatial scales. Additionally, the motion vector field is derived from subsequent radar images and integrated into a set of topographic features to highlight the slopes exposed to main flows. Following the exploratory data analysis with a recent algorithm of spectral clustering, it is shown that orographic precipitation cells are generated under specific flow and topographic conditions. Repeatability of precipitation patterns in particular spatial locations is found to be linked to specific local terrain shapes, e.g. at the top of hills and on the upwind side of the mountains. This methodology and our empirical findings for the Alpine region provide a basis for building computational data-driven models of orographic enhancement and triggering of precipitation. Copyright (C) 2011 Royal Meteorological Society .
Resumo:
In this project a research both in finding predictors via clustering techniques and in reviewing the Data Mining free software is achieved. The research is based in a case of study, from where additionally to the KDD free software used by the scientific community; a new free tool for pre-processing the data is presented. The predictors are intended for the e-learning domain as the data from where these predictors have to be inferred are student qualifications from different e-learning environments. Through our case of study not only clustering algorithms are tested but also additional goals are proposed.
Resumo:
BACKGROUND Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and Positron Emission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD) diagnosis. However, the subjectivity involved in their evaluation has favoured the development of Computer Aided Diagnosis (CAD) Systems. METHODS It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly, Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean Square Error (NMSE) features restricted to be located within a predefined brain activation mask. In order to address the small sample-size problem, the dimension of the feature space was further reduced by: Large Margin Nearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) or Partial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding the classifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis and Energy-based metrics were compared. RESULTS Several experiments were conducted in order to evaluate the proposed LMNN-based feature extraction algorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reduction technique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system was evaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of 92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when a NMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thus outperforming recently reported baseline methods. CONCLUSIONS All the proposed methods turned out to be a valid solution for the presented problem. One of the advances is the robustness of the LMNN algorithm that not only provides higher separation rate between the classes but it also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, their generalization ability is another advance since several experiments were performed on two image modalities (SPECT and PET).
Resumo:
BACKGROUND: In Switzerland, intravenous drug use (IDU) accounts for 80% of newly acquired hepatitis C virus (HCV) infections. Early HCV treatment has the potential to interrupt the transmission chain and reduce morbidity/mortality due to decompensated liver cirrhosis and hepatocellular carcinoma. Nevertheless, patients in drug substitution programs are often insufficiently screened and treated. OBJECTIVE/METHODS: With the aim to improve HCV management in IDUs, we conducted a cross sectional chart review in three opioid substitution programs in St. Gallen (125 methadone and 71 heroin recipients). Results were compared with another heroin substitution program in Bern (202 patients) and SCCS/SHCS data. RESULTS: Among the methadone/heroin recipients in St. Gallen, diagnostic workup of HCV was better than expected: HCV/HIV-status was unknown in only 1% (2/196), HCV RNA was not performed in 9% (13/146) of anti-HCV-positives and the genotype missing in 15% (12/78) of HCV RNA-positives. In those without spontaneous clearance (two thirds), HCV treatment uptake was 23% (21/91) (HIV-: 29% (20/68), HIV+: 4% (1/23)), which was lower than in methadone/heroin recipients and particularly non-IDUs within the SCCS/SHCS, but higher than in the, mainly psychiatrically focussed, heroin substitution program in Bern (8%). Sustained virological response (SVR) rates were comparable in all settings (overall: 50%, genotype 1: 35-40%, genotype 3: two thirds). In St. Gallen, the median delay from the estimated date of infection (IDU start) to first diagnosis was 10 years and to treatment was another 7.5 years. CONCLUSIONS: Future efforts need to focus on earlier HCV diagnosis and improvement of treatment uptake among patients in drug substitution programs, particularly if patients are HIV-co-infected. New potent drugs might facilitate the decision to initiate treatment.
Resumo:
Time-lapse geophysical data acquired during transient hydrological experiments are being increasingly employed to estimate subsurface hydraulic properties at the field scale. In particular, crosshole ground-penetrating radar (GPR) data, collected while water infiltrates into the subsurface either by natural or artificial means, have been demonstrated in a number of studies to contain valuable information concerning the hydraulic properties of the unsaturated zone. Previous work in this domain has considered a variety of infiltration conditions and different amounts of time-lapse GPR data in the estimation procedure. However, the particular benefits and drawbacks of these different strategies as well as the impact of a variety of key and common assumptions remain unclear. Using a Bayesian Markov-chain-Monte-Carlo stochastic inversion methodology, we examine in this paper the information content of time-lapse zero-offset-profile (ZOP) GPR traveltime data, collected under three different infiltration conditions, for the estimation of van Genuchten-Mualem (VGM) parameters in a layered subsurface medium. Specifically, we systematically analyze synthetic and field GPR data acquired under natural loading and two rates of forced infiltration, and we consider the value of incorporating different amounts of time-lapse measurements into the estimation procedure. Our results confirm that, for all infiltration scenarios considered, the ZOP GPR traveltime data contain important information about subsurface hydraulic properties as a function of depth, with forced infiltration offering the greatest potential for VGM parameter refinement because of the higher stressing of the hydrological system. Considering greater amounts of time-lapse data in the inversion procedure is also found to help refine VGM parameter estimates. Quite importantly, however, inconsistencies observed in the field results point to the strong possibility that posterior uncertainties are being influenced by model structural errors, which in turn underlines the fundamental importance of a systematic analysis of such errors in future related studies.
Resumo:
Globalization involves several facility location problems that need to be handled at large scale. Location Allocation (LA) is a combinatorial problem in which the distance among points in the data space matter. Precisely, taking advantage of the distance property of the domain we exploit the capability of clustering techniques to partition the data space in order to convert an initial large LA problem into several simpler LA problems. Particularly, our motivation problem involves a huge geographical area that can be partitioned under overall conditions. We present different types of clustering techniques and then we perform a cluster analysis over our dataset in order to partition it. After that, we solve the LA problem applying simulated annealing algorithm to the clustered and non-clustered data in order to work out how profitable is the clustering and which of the presented methods is the most suitable
Resumo:
Atherogenic dyslipidemia, manifest by low HDL-cholesterol and high TG levels, is an important component of ATP-III defined metabolic syndrome. Here, we dissected the phenotypic and genetic architecture of these traits by assessing their relationships with other metabolically relevant measures, including plasma adipo-cytokines, highly sensitive C-reactive protein (hsCRP) and LDL particle size, in a large family data set (n=2800) and in an independent set of dyslipidemic cases (n=716) and normolipidemic controls (n=1073). We explored the relationships among these phenotypes using variable clustering and then estimated their genetic heritabilities and cross-trait correlations. In families, four clusters explained 61% of the total variance, with one adiposity-related cluster (including hsCRP), one BP-related cluster, and two lipid-related clusters (HDL-C, TG, adiponectin and LDL particle size; apoB and non-HDL-C). A similar structure was observed in dyslipidemic cases and normolipidemic controls. The genetic correlations in the families largely paralleled the phenotype clustering results, suggesting that common genes having pleiotropic effects contributed to the correlations observed. In summary, our analyses support a model of metabolic syndrome with two major components, body fat and lipids, each with two subcomponents, and quantifies their degree of overlap with each other and with metabolic-syndrome related measures (adipokines, LDL particle size and hsCRP).
Resumo:
An adaptation technique based on the synoptic atmospheric circulation to forecast local precipitation, namely the analogue method, has been implemented for the western Swiss Alps. During the calibration procedure, relevance maps were established for the geopotential height data. These maps highlight the locations were the synoptic circulation was found of interest for the precipitation forecasting at two rain gauge stations (Binn and Les Marécottes) that are located both in the alpine Rhône catchment, at a distance of about 100 km from each other. These two stations are sensitive to different atmospheric circulations. We have observed that the most relevant data for the analogue method can be found where specific atmospheric circulation patterns appear concomitantly with heavy precipitation events. Those skilled regions are coherent with the atmospheric flows illustrated, for example, by means of the back trajectories of air masses. Indeed, the circulation recurrently diverges from the climatology during days with strong precipitation on the southern part of the alpine Rhône catchment. We have found that for over 152 days with precipitation amount above 50 mm at the Binn station, only 3 did not show a trajectory of a southerly flow, meaning that such a circulation was present for 98% of the events. Time evolution of the relevance maps confirms that the atmospheric circulation variables have significantly better forecasting skills close to the precipitation period, and that it seems pointless for the analogue method to consider circulation information days before a precipitation event as a primary predictor. Even though the occurrence of some critical circulation patterns leading to heavy precipitation events can be detected by precursors at remote locations and 1 week ahead (Grazzini, 2007; Martius et al., 2008), time extrapolation by the analogue method seems to be rather poor. This would suggest, in accordance with previous studies (Obled et al., 2002; Bontron and Obled, 2005), that time extrapolation should be done by the Global Circulation Model, which can process atmospheric variables that can be used by the adaptation method.
Resumo:
CONTEXT: Plasma levels of C-reactive protein (CRP) are independently associated with risk of coronary heart disease, but whether CRP is causally associated with coronary heart disease or merely a marker of underlying atherosclerosis is uncertain. OBJECTIVE: To investigate association of genetic loci with CRP levels and risk of coronary heart disease. DESIGN, SETTING, AND PARTICIPANTS: We first carried out a genome-wide association (n = 17,967) and replication study (n = 13,615) to identify genetic loci associated with plasma CRP concentrations. Data collection took place between 1989 and 2008 and genotyping between 2003 and 2008. We carried out a mendelian randomization study of the most closely associated single-nucleotide polymorphism (SNP) in the CRP locus and published data on other CRP variants involving a total of 28,112 cases and 100,823 controls, to investigate the association of CRP variants with coronary heart disease. We compared our finding with that predicted from meta-analysis of observational studies of CRP levels and risk of coronary heart disease. For the other loci associated with CRP levels, we selected the most closely associated SNP for testing against coronary heart disease among 14,365 cases and 32,069 controls. MAIN OUTCOME MEASURE: Risk of coronary heart disease. RESULTS: Polymorphisms in 5 genetic loci were strongly associated with CRP levels (% difference per minor allele): SNP rs6700896 in LEPR (-14.8%; 95% confidence interval [CI], -17.6% to -12.0%; P = 6.2 x 10(-22)), rs4537545 in IL6R (-11.5%; 95% CI, -14.4% to -8.5%; P = 1.3 x 10(-12)), rs7553007 in the CRP locus (-20.7%; 95% CI, -23.4% to -17.9%; P = 1.3 x 10(-38)), rs1183910 in HNF1A (-13.8%; 95% CI, -16.6% to -10.9%; P = 1.9 x 10(-18)), and rs4420638 in APOE-CI-CII (-21.8%; 95% CI, -25.3% to -18.1%; P = 8.1 x 10(-26)). Association of SNP rs7553007 in the CRP locus with coronary heart disease gave an odds ratio (OR) of 0.98 (95% CI, 0.94 to 1.01) per 20% lower CRP level. Our mendelian randomization study of variants in the CRP locus showed no association with coronary heart disease: OR, 1.00; 95% CI, 0.97 to 1.02; per 20% lower CRP level, compared with OR, 0.94; 95% CI, 0.94 to 0.95; predicted from meta-analysis of the observational studies of CRP levels and coronary heart disease (z score, -3.45; P < .001). SNPs rs6700896 in LEPR (OR, 1.06; 95% CI, 1.02 to 1.09; per minor allele), rs4537545 in IL6R (OR, 0.94; 95% CI, 0.91 to 0.97), and rs4420638 in the APOE-CI-CII cluster (OR, 1.16; 95% CI, 1.12 to 1.21) were all associated with risk of coronary heart disease. CONCLUSION: The lack of concordance between the effect on coronary heart disease risk of CRP genotypes and CRP levels argues against a causal association of CRP with coronary heart disease.
Resumo:
BACKGROUND: Little is known about engagement in multiple health behaviours in childhood cancer survivors. METHODS: Using latent class analysis, we identified health behaviour patterns in 835 adult survivors of childhood cancer (age 20-35 years) and 1670 age- and sex-matched controls from the general population. Behaviour groups were determined from replies to questions on smoking, drinking, cannabis use, sporting activities, diet, sun protection and skin examination. RESULTS: The model identified four health behaviour patterns: 'risk-avoidance', with a generally healthy behaviour; 'moderate drinking', with higher levels of sporting activities, but moderate alcohol-consumption; 'risk-taking', engaging in several risk behaviours; and 'smoking', smoking but not drinking. Similar proportions of survivors and controls fell into the 'risk-avoiding' (42% vs 44%) and the 'risk-taking' cluster (14% vs 12%), but more survivors were in the 'moderate drinking' (39% vs 28%) and fewer in the 'smoking' cluster (5% vs 16%). Determinants of health behaviour clusters were gender, migration background, income and therapy. CONCLUSION: A comparable proportion of childhood cancer survivors as in the general population engage in multiple health-compromising behaviours. Because of increased vulnerability of survivors, multiple risk behaviours should be addressed in targeted health interventions.
Resumo:
A biplot, which is the multivariate generalization of the two-variable scatterplot, can be used to visualize the results of many multivariate techniques, especially those that are based on the singular value decomposition. We consider data sets consisting of continuous-scale measurements, their fuzzy coding and the biplots that visualize them, using a fuzzy version of multiple correspondence analysis. Of special interest is the way quality of fit of the biplot is measured, since it is well-known that regular (i.e., crisp) multiple correspondence analysis seriously under-estimates this measure. We show how the results of fuzzy multiple correspondence analysis can be defuzzified to obtain estimated values of the original data, and prove that this implies an orthogonal decomposition of variance. This permits a measure of fit to be calculated in the familiar form of a percentage of explained variance, which is directly comparable to the corresponding fit measure used in principal component analysis of the original data. The approach is motivated initially by its application to a simulated data set, showing how the fuzzy approach can lead to diagnosing nonlinear relationships, and finally it is applied to a real set of meteorological data.
Resumo:
This work proposes an original contribution to the understanding of shermen spatial behavior, based on the behavioral ecology and movement ecology paradigms. Through the analysis of Vessel Monitoring System (VMS) data, we characterized the spatial behavior of Peruvian anchovy shermen at di erent scales: (1) the behavioral modes within shing trips (i.e., searching, shing and cruising); (2) the behavioral patterns among shing trips; (3) the behavioral patterns by shing season conditioned by ecosystem scenarios; and (4) the computation of maps of anchovy presence proxy from the spatial patterns of behavioral mode positions. At the rst scale considered, we compared several Markovian (hidden Markov and semi-Markov models) and discriminative models (random forests, support vector machines and arti cial neural networks) for inferring the behavioral modes associated with VMS tracks. The models were trained under a supervised setting and validated using tracks for which behavioral modes were known (from on-board observers records). Hidden semi-Markov models performed better, and were retained for inferring the behavioral modes on the entire VMS dataset. At the second scale considered, each shing trip was characterized by several features, including the time spent within each behavioral mode. Using a clustering analysis, shing trip patterns were classi ed into groups associated to management zones, eet segments and skippers' personalities. At the third scale considered, we analyzed how ecological conditions shaped shermen behavior. By means of co-inertia analyses, we found signi cant associations between shermen, anchovy and environmental spatial dynamics, and shermen behavioral responses were characterized according to contrasted environmental scenarios. At the fourth scale considered, we investigated whether the spatial behavior of shermen re ected to some extent the spatial distribution of anchovy. Finally, this work provides a wider view of shermen behavior: shermen are not only economic agents, but they are also foragers, constrained by ecosystem variability. To conclude, we discuss how these ndings may be of importance for sheries management, collective behavior analyses and end-to-end models.
Estimation of surface roughness in a semiarid region from C-band ERS-1 synthetic aperture radar data
Resumo:
In this study, we investigated the feasibility of using the C-band European Remote Sensing Satellite (ERS-1) synthetic aperture radar (SAR) data to estimate surface soil roughness in a semiarid rangeland. Radar backscattering coefficients were extracted from a dry and a wet season SAR image and were compared with 47 in situ soil roughness measurements obtained in the rocky soils of the Walnut Gulch Experimental Watershed, southeastern Arizona, USA. Both the dry and the wet season SAR data showed exponential relationships with root mean square (RMS) height measurements. The dry C-band ERS-1 SAR data were strongly correlated (R² = 0.80), while the wet season SAR data have somewhat higher secondary variation (R² = 0.59). This lower correlation was probably provoked by the stronger influence of soil moisture, which may not be negligible in the wet season SAR data. We concluded that the single configuration C-band SAR data is useful to estimate surface roughness of rocky soils in a semiarid rangeland.
Resumo:
PURPOSE: Quality of care and its measurement represent a considerable challenge for pediatric smaller-scale comprehensive cancer centers (pSSCC) providing surgical oncology services. It remains unclear whether center size and/or yearly case-flow numbers influence the quality of care, and therefore impact outcomes for this population of patients. PATIENTS AND METHODS: We performed a 14-year, retrospective, single-center analysis, assessing adherence to treatment protocols and surgical adverse events as quality indicators in abdominal and thoracic pediatric solid tumor surgery. RESULTS: Forty-eight patients, enrolled in a research-associated treatment protocol, underwent 51 cancer-oriented surgical procedures. All the protocols contain precise technical criteria, indications, and instructions for tumor surgery. Overall, compliance with such items was very high, with 997/1,035 items (95 %) meeting protocol requirements. There was no surgical mortality. Twenty-one patients (43 %) had one or more complications, for a total of 34 complications (66 % of procedures). Overall, 85 % of complications were grade 1 or 2 according to Clavien-Dindo classification requiring observation or minor medical treatment. Case-sample and outcome/effectiveness data were comparable to published series. Overall, our data suggest that even with the modest caseload of a pSSCC within a Swiss tertiary academic hospital, compliance with international standards can be very high, and the incidence of adverse events can be kept minimal. CONCLUSION: Open and objective data sharing, and discussion between pSSCCs, will ultimately benefit our patient populations. Our study is an initial step towards the enhancement of critical self-review and quality-of-care measurements in this setting.