925 resultados para Data clustering. Fuzzy C-Means. Cluster centers initialization. Validation indices
Resumo:
Tese de Doutoramento em Sociologia
Resumo:
Usually, data warehousing populating processes are data-oriented workflows composed by dozens of granular tasks that are responsible for the integration of data coming from different data sources. Specific subset of these tasks can be grouped on a collection together with their relationships in order to form higher- level constructs. Increasing task granularity allows for the generalization of processes, simplifying their views and providing methods to carry out expertise to new applications. Well-proven practices can be used to describe general solutions that use basic skeletons configured and instantiated according to a set of specific integration requirements. Patterns can be applied to ETL processes aiming to simplify not only a possible conceptual representation but also to reduce the gap that often exists between two design perspectives. In this paper, we demonstrate the feasibility and effectiveness of an ETL pattern-based approach using task clustering, analyzing a real world ETL scenario through the definitions of two commonly used clusters of tasks: a data lookup cluster and a data conciliation and integration cluster.
Resumo:
Worldwide, around 9% of the children are born with less than 37 weeks of labour, causing risk to the premature child, whom it is not prepared to develop a number of basic functions that begin soon after the birth. In order to ensure that those risk pregnancies are being properly monitored by the obstetricians in time to avoid those problems, Data Mining (DM) models were induced in this study to predict preterm births in a real environment using data from 3376 patients (women) admitted in the maternal and perinatal care unit of Centro Hospitalar of Oporto. A sensitive metric to predict preterm deliveries was developed, assisting physicians in the decision-making process regarding the patients’ observation. It was possible to obtain promising results, achieving sensitivity and specificity values of 96% and 98%, respectively.
Resumo:
In this study, we concentrate on modelling gross primary productivity using two simple approaches to simulate canopy photosynthesis: "big leaf" and "sun/shade" models. Two approaches for calibration are used: scaling up of canopy photosynthetic parameters from the leaf to the canopy level and fitting canopy biochemistry to eddy covariance fluxes. Validation of the models is achieved by using eddy covariance data from the LBA site C14. Comparing the performance of both models we conclude that numerically (in terms of goodness of fit) and qualitatively, (in terms of residual response to different environmental variables) sun/shade does a better job. Compared to the sun/shade model, the big leaf model shows a lower goodness of fit and fails to respond to variations in the diffuse fraction, also having skewed responses to temperature and VPD. The separate treatment of sun and shade leaves in combination with the separation of the incoming light into direct beam and diffuse make sun/shade a strong modelling tool that catches more of the observed variability in canopy fluxes as measured by eddy covariance. In conclusion, the sun/shade approach is a relatively simple and effective tool for modelling photosynthetic carbon uptake that could be easily included in many terrestrial carbon models.
Resumo:
OBJECTIVE - A population-based prospective study was analysed to: a) determine the prevalence of hypertension; b) investigate the clustering of other cardiovascular risk factors and c) verify whether older differed from younger adults in the pattern of clustering. METHODS - The data comprised a representative sample of the population of Bambuí, Brazil. Multiple logistic regression was used to investigate the independent association between hypertension and selected factors. RESULTS - A total of 820 younger adults (82.5%) and 1494 older adults (85.9%) participated in this study. The overall prevalence of hypertension was 24.8% (SE=1.4 %), being higher in women (26.9±1.5%) than in men (22.0± 1.7%) (p=0.033). Hypertension was positively and significantly associated with physical inactivity, overweight, hypercholesterolemia hyperglycemia and hypertriglyceridemia. The coexistence of hypertension with 4 or more of these risk factors occurred 6 times more than expected by chance, after adjusting for age and sex (OR=6.3; 95%CI: 3.4-11.9). The pattern of risk factor clustering in hypertensive individuals differed with age. CONCLUSION - Our results reinforce the need to increase detection and treatment of hypertension and to approach patients' global risk profiles.
Resumo:
Football is considered nowadays one of the most popular sports. In the betting world, it has acquired an outstanding position, which moves millions of euros during the period of a single football match. The lack of profitability of football betting users has been stressed as a problem. This lack gave origin to this research proposal, which it is going to analyse the possibility of existing a way to support the users to increase their profits on their bets. Data mining models were induced with the purpose of supporting the gamblers to increase their profits in the medium/long term. Being conscience that the models can fail, the results achieved by four of the seven targets in the models are encouraging and suggest that the system can help to increase the profits. All defined targets have two possible classes to predict, for example, if there are more or less than 7.5 corners in a single game. The data mining models of the targets, more or less than 7.5 corners, 8.5 corners, 1.5 goals and 3.5 goals achieved the pre-defined thresholds. The models were implemented in a prototype, which it is a pervasive decision support system. This system was developed with the purpose to be an interface for any user, both for an expert user as to a user who has no knowledge in football games.
Resumo:
Healthcare organizations often benefit from information technologies as well as embedded decision support systems, which improve the quality of services and help preventing complications and adverse events. In Centro Materno Infantil do Norte (CMIN), the maternal and perinatal care unit of Centro Hospitalar of Oporto (CHP), an intelligent pre-triage system is implemented, aiming to prioritize patients in need of gynaecology and obstetrics care in two classes: urgent and consultation. The system is designed to evade emergency problems such as incorrect triage outcomes and extensive triage waiting times. The current study intends to improve the triage system, and therefore, optimize the patient workflow through the emergency room, by predicting the triage waiting time comprised between the patient triage and their medical admission. For this purpose, data mining (DM) techniques are induced in selected information provided by the information technologies implemented in CMIN. The DM models achieved accuracy values of approximately 94% with a five range target distribution, which not only allow obtaining confident prediction models, but also identify the variables that stand as direct inducers to the triage waiting times.
Resumo:
Systemidentification, evolutionary automatic, data-driven model, fuzzy Takagi-Sugeno grammar, genotype interpretability, toxicity-prediction
Resumo:
Online Data Mining, Data Streams, Classification, Clustering
Resumo:
Magdeburg, Univ., Fak. für Informatik, Diss., 2010
Resumo:
Magdeburg, Univ., Fak. für Inf., Diss., 2014
Resumo:
The long term goal of this research is to develop a program able to produce an automatic segmentation and categorization of textual sequences into discourse types. In this preliminary contribution, we present the construction of an algorithm which takes a segmented text as input and attempts to produce a categorization of sequences, such as narrative, argumentative, descriptive and so on. Also, this work aims at investigating a possible convergence between the typological approach developed in particular in the field of text and discourse analysis in French by Adam (2008) and Bronckart (1997) and unsupervised statistical learning.
Resumo:
Fractal geometry is a fundamental approach for describing the complex irregularities of the spatial structure of point patterns. The present research characterizes the spatial structure of the Swiss population distribution in the three Swiss geographical regions (Alps, Plateau and Jura) and at the entire country level. These analyses were carried out using fractal and multifractal measures for point patterns, which enabled the estimation of the spatial degree of clustering of a distribution at different scales. The Swiss population dataset is presented on a grid of points and thus it can be modelled as a "point process" where each point is characterized by its spatial location (geometrical support) and a number of inhabitants (measured variable). The fractal characterization was performed by means of the box-counting dimension and the multifractal analysis was conducted through the Renyi's generalized dimensions and the multifractal spectrum. Results showed that the four population patterns are all multifractals and present different clustering behaviours. Applying multifractal and fractal methods at different geographical regions and at different scales allowed us to quantify and describe the dissimilarities between the four structures and their underlying processes. This paper is the first Swiss geodemographic study applying multifractal methods using high resolution data.
Resumo:
The use of the Internet now has a specific purpose: to find information. Unfortunately, the amount of data available on the Internet is growing exponentially, creating what can be considered a nearly infinite and ever-evolving network with no discernable structure. This rapid growth has raised the question of how to find the most relevant information. Many different techniques have been introduced to address the information overload, including search engines, Semantic Web, and recommender systems, among others. Recommender systems are computer-based techniques that are used to reduce information overload and recommend products likely to interest a user when given some information about the user's profile. This technique is mainly used in e-Commerce to suggest items that fit a customer's purchasing tendencies. The use of recommender systems for e-Government is a research topic that is intended to improve the interaction among public administrations, citizens, and the private sector through reducing information overload on e-Government services. More specifically, e-Democracy aims to increase citizens' participation in democratic processes through the use of information and communication technologies. In this chapter, an architecture of a recommender system that uses fuzzy clustering methods for e-Elections is introduced. In addition, a comparison with the smartvote system, a Web-based Voting Assistance Application (VAA) used to aid voters in finding the party or candidate that is most in line with their preferences, is presented.
Resumo:
BACKGROUND: School-based intervention studies promoting a healthy lifestyle have shown favorable immediate health effects. However, there is a striking paucity on long-term follow-ups. The aim of this study was therefore to assess the 3 yr-follow-up of a cluster-randomized controlled school-based physical activity program over nine month with beneficial immediate effects on body fat, aerobic fitness and physical activity. METHODS AND FINDINGS: Initially, 28 classes from 15 elementary schools in Switzerland were grouped into an intervention (16 classes from 9 schools, n = 297 children) and a control arm (12 classes from 6 schools, n = 205 children) after stratification for grade (1st and 5th graders). Three years after the end of the multi-component physical activity program of nine months including daily physical education (i.e. two additional lessons per week on top of three regular lessons), short physical activity breaks during academic lessons, and daily physical activity homework, 289 (58%) participated in the follow-up. Primary outcome measures included body fat (sum of four skinfolds), aerobic fitness (shuttle run test), physical activity (accelerometry), and quality of life (questionnaires). After adjustment for grade, gender, baseline value and clustering within classes, children in the intervention arm compared with controls had a significantly higher average level of aerobic fitness at follow-up (0.373 z-score units [95%-CI: 0.157 to 0.59, p = 0.001] corresponding to a shift from the 50th to the 65th percentile between baseline and follow-up), while the immediate beneficial effects on the other primary outcomes were not sustained. CONCLUSIONS: Apart from aerobic fitness, beneficial effects seen after one year were not maintained when the intervention was stopped. A continuous intervention seems necessary to maintain overall beneficial health effects as reached at the end of the intervention. TRIAL REGISTRATION: ControlledTrials.com ISRCTN15360785.