853 resultados para height partition clustering
Resumo:
OBJECTIVE: To estimate the incidence rate of type 1 diabetes in the urban area of Santiago, Chile, from March 21, 1997 to March 20, 1998, and to assess the spatio-temporal clustering of cases during that period. METHODS: All sixty-one incident cases were located temporally (day of diagnosis) and spatially (place of residence) in the area of study. Knox's method was used to assess spatio-temporal clustering of incident cases. RESULTS: The overall incidence rate of type 1 diabetes was 4.11 cases per 100,000 children aged less than 15 years per year (95% confidence interval: 3.06--5.14). The incidence rate seems to have increased since the last estimate of the incidence calculated for the years 1986--1992 in the metropolitan region of Santiago. Different combinations of space-time intervals have been evaluated to assess spatio-temporal clustering. The smallest p-value was found for the combination of critical distances of 750 meters and 60 days (uncorrected p-value = 0.048). CONCLUSIONS: Although these are preliminary results regarding space-time clustering in Santiago, exploratory analysis of the data method would suggest a possible aggregation of incident cases in space-time coordinates.
Resumo:
In this article we consider the monoid O(mxn) of all order-preserving full transformations on a chain with mn elements that preserve a uniformm-partition and its submonoids O(mxn)(+) and O(mxn)(-) of all extensive transformations and of all co-extensive transformations, respectively. We determine their ranks and construct a bilateral semidirect product decomposition of O(mxn) in terms of O(mxn)(-) and O(mxn)(+).
Resumo:
The growing importance and influence of new resources connected to the power systems has caused many changes in their operation. Environmental policies and several well know advantages have been made renewable based energy resources largely disseminated. These resources, including Distributed Generation (DG), are being connected to lower voltage levels where Demand Response (DR) must be considered too. These changes increase the complexity of the system operation due to both new operational constraints and amounts of data to be processed. Virtual Power Players (VPP) are entities able to manage these resources. Addressing these issues, this paper proposes a methodology to support VPP actions when these act as a Curtailment Service Provider (CSP) that provides DR capacity to a DR program declared by the Independent System Operator (ISO) or by the VPP itself. The amount of DR capacity that the CSP can assure is determined using data mining techniques applied to a database which is obtained for a large set of operation scenarios. The paper includes a case study based on 27,000 scenarios considering a diversity of distributed resources in a 33 bus distribution network.
Resumo:
A methodology based on data mining techniques to support the analysis of zonal prices in real transmission networks is proposed in this paper. The mentioned methodology uses clustering algorithms to group the buses in typical classes that include a set of buses with similar LMP values. Two different clustering algorithms have been used to determine the LMP clusters: the two-step and K-means algorithms. In order to evaluate the quality of the partition as well as the best performance algorithm adequacy measurements indices are used. The paper includes a case study using a Locational Marginal Prices (LMP) data base from the California ISO (CAISO) in order to identify zonal prices.
Resumo:
Mestrado em Engenharia Informática
Resumo:
Wastewater from cork processing industry present high levels of organic and phenolic compounds, such as tannins, with a low biodegradability and a significant toxicity. These compounds are not readily removed by conventional municipal wastewater treatment, which is largely based on primary sedimentation followed by biological treatment. The purpose of this work is to study the biodegradability of different cork wastewater fractions, obtained through membrane separation, in order to assess its potential for biological treatment and having in view its valorisation through tannins recovery, which could be applied in other industries. Various ultrafiltration and nanofiltration membranes where used, with molecular weight cut-offs (MWCO) ranging from 0.125 to 91 kDa. The wastewater and the different permeated fractions were analyzed in terms of Total Organic Carbon (TOC), Chemical Oxygen Demand (COD), Biochemical Oxygen Demand (BOD), Total Phenols (TP), Tannins, Color, pH and Conductivity. Results for the wastewater shown that it is characterized by a high organic content (670.5-1056.8 mg TOC/L, 2285-2604 mg COD/L, 1000-1225 mg BOD/L), a relatively low biodegradability (0.35-0.38 for BODs/COD and 0.44-0.47 for BOD20/COD) and a high content of phenols (360-410 mg tannic acid/L) and tannins (250-270 mg tannic acid/L). The results for the wastewater fractions shown a general decrease on the pollutant content of permeates, and an increase of its biodegradability, with the decrease of the membrane MWCO applied. Particularly, the permeated fraction from the membrane MWCO of 3.8 kDa, presented a favourable index of biodegradability (0.8) and a minimized phenols toxicity that enables it to undergo a biological treatment and so, to be treated in a municipal wastewater treatment plant. Also, within the perspective of valorisation, the rejected fraction obtained through this membrane MWCO may have a significant potential for tannins recovery. Permeated fractions from membranes with MWCO lower than 3.8 kDa, presented a particularly significant decline of organic matter and phenols, enabling this permeates to be reused in the cork processing and so, representing an interesting perspective of zero discharge for the cork industry, with evident environmental and economic advantages. (C) 2010 Elsevier Ltd. All rights reserved.
Resumo:
In order to overcome the problems associated with low water solubility, and consequently low bioavailability of active pharmaceutical ingredients (APIs), herein we explore a modular ionic liquid synthetic strategy for improved APIs. Ionic liquids containing l-ampicillin as active pharmaceutical ingredient anion were prepared using the methodology developed in our previous work, using organic cations selected from substituted ammonium, phosphonium, pyridinium and methylimidazolium salts, with the intent of enhancing the solubility and bioavailability of l-ampicillin forms. In order to evaluate important properties of the synthesized API-ILs, the water solubility at 25 °C and 37 °C (body temperature) as well as octanol–water partition coefficients (Kow's) and HDPC micelles partition at 25 °C were measured. Critical micelle concentrations (CMC's) in water at 25 °C and 37 °C of the pharmaceutical ionic liquids bearing cations with surfactant properties were also determined from ionic conductivity measurements.
Resumo:
3rd SMTDA Conference Proceedings, 11-14 June 2014, Lisbon Portugal.
Resumo:
TPM Vol. 21, No. 4, December 2014, 435-447 – Special Issue © 2014 Cises.
Resumo:
Com a crescente geração, armazenamento e disseminação da informação nos últimos anos, o anterior problema de falta de informação transformou-se num problema de extracção do conhecimento útil a partir da informação disponível. As representações visuais da informação abstracta têm sido utilizadas para auxiliar a interpretação os dados e para revelar padrões de outra forma escondidos. A visualização de informação procura aumentar a cognição humana aproveitando as capacidades visuais humanas, de forma a tornar perceptível a informação abstracta, fornecendo os meios necessários para que um humano possa absorver quantidades crescentes de informação, com as suas capacidades de percepção. O objectivo das técnicas de agrupamento de dados consiste na divisão de um conjunto de dados em vários grupos, em que dados semelhantes são colocados no mesmo grupo e dados dissemelhantes em grupos diferentes. Mais especificamente, o agrupamento de dados com restrições tem o intuito de incorporar conhecimento a priori no processo de agrupamento de dados, com o objectivo de aumentar a qualidade do agrupamento de dados e, simultaneamente, encontrar soluções apropriadas a tarefas e interesses específicos. Nesta dissertação é estudado a abordagem de Agrupamento de Dados Visual Interactivo que permite ao utilizador, através da interacção com uma representação visual da informação, incorporar o seu conhecimento prévio acerca do domínio de dados, de forma a influenciar o agrupamento resultante para satisfazer os seus objectivos. Esta abordagem combina e estende técnicas de visualização interactiva de informação, desenho de grafos de forças direccionadas e agrupamento de dados com restrições. Com o propósito de avaliar o desempenho de diferentes estratégias de interacção com o utilizador, são efectuados estudos comparativos utilizando conjuntos de dados sintéticos e reais.
Resumo:
OBJECTIVE: To validate a new symphysis-fundal curve for screening fetal growth deviations and to compare its performance with the standard curve adopted by the Brazilian Ministry of Health. METHODS: Observational study including a total of 753 low-risk pregnant women with gestational age above 27 weeks between March to October 2006 in the city of João Pessoa, Northeastern Brazil. Symphisys-fundal was measured using a standard technique recommended by the Brazilian Ministry of Health. Estimated fetal weight assessed through ultrasound using the Brazilian fetal weight chart for gestational age was the gold standard. A subsample of 122 women with neonatal weight measurements was taken up to seven days after estimated fetal weight measurements and symphisys-fundal classification was compared with Lubchenco growth reference curve as gold standard. Sensitivity, specificity, positive and negative predictive values were calculated. The McNemar χ2 test was used for comparing sensitivity of both symphisys-fundal curves studied. RESULTS: The sensitivity of the new curve for detecting small for gestational age fetuses was 51.6% while that of the Brazilian Ministry of Health reference curve was significantly lower (12.5%). In the subsample using neonatal weight as gold standard, the sensitivity of the new reference curve was 85.7% while that of the Brazilian Ministry of Health was 42.9% for detecting small for gestational age. CONCLUSIONS: The diagnostic performance of the new curve for detecting small for gestational age fetuses was significantly higher than that of the Brazilian Ministry of Health reference curve.
Resumo:
The objectives of this work were: (1) to identify an isotherm model to relate the contaminant contents in the gas phase with those in the solid and non-aqueous liquid phases; (2) to develop a methodology for the estimation of the contaminant distribution in the different phases of the soil; and (3) to evaluate the influence of soil water content on the contaminant distribution in soil. For sandy soils with negligible contents of clay and natural organic matter, contaminated with benzene, toluene, ethylbenzene, xylene, trichloroethylene (TCE), and perchloroethylene (PCE), it was concluded that: (1) Freundlich’s model showed to be adequate to relate the contaminant contents in the gas phase with those in the solid and non-aqueous liquid phases; (2) the distribution of the contaminants in the different phases present in the soil could be estimated with differences lower than 10% for 83% of the cases; and (3) an increase of the soil water content led to a decrease of the amount of contaminant in the solid and non-aqueous liquid phases, increasing the amount in the other phases.
Resumo:
Clustering analysis is a useful tool to detect and monitor disease patterns and, consequently, to contribute for an effective population disease management. Portugal has the highest incidence of tuberculosis in the European Union (in 2012, 21.6 cases per 100.000 inhabitants), although it has been decreasing consistently. Two critical PTB (Pulmonary Tuberculosis) areas, metropolitan Oporto and metropolitan Lisbon regions, were previously identified through spatial and space-time clustering for PTB incidence rate and risk factors. Identifying clusters of temporal trends can further elucidate policy makers about municipalities showing a faster or a slower TB control improvement.
Resumo:
Research on the problem of feature selection for clustering continues to develop. This is a challenging task, mainly due to the absence of class labels to guide the search for relevant features. Categorical feature selection for clustering has rarely been addressed in the literature, with most of the proposed approaches having focused on numerical data. In this work, we propose an approach to simultaneously cluster categorical data and select a subset of relevant features. Our approach is based on a modification of a finite mixture model (of multinomial distributions), where a set of latent variables indicate the relevance of each feature. To estimate the model parameters, we implement a variant of the expectation-maximization algorithm that simultaneously selects the subset of relevant features, using a minimum message length criterion. The proposed approach compares favourably with two baseline methods: a filter based on an entropy measure and a wrapper based on mutual information. The results obtained on synthetic data illustrate the ability of the proposed expectation-maximization method to recover ground truth. An application to real data, referred to official statistics, shows its usefulness.
Resumo:
Research on cluster analysis for categorical data continues to develop, new clustering algorithms being proposed. However, in this context, the determination of the number of clusters is rarely addressed. We propose a new approach in which clustering and the estimation of the number of clusters is done simultaneously for categorical data. We assume that the data originate from a finite mixture of multinomial distributions and use a minimum message length criterion (MML) to select the number of clusters (Wallace and Bolton, 1986). For this purpose, we implement an EM-type algorithm (Silvestre et al., 2008) based on the (Figueiredo and Jain, 2002) approach. The novelty of the approach rests on the integration of the model estimation and selection of the number of clusters in a single algorithm, rather than selecting this number based on a set of pre-estimated candidate models. The performance of our approach is compared with the use of Bayesian Information Criterion (BIC) (Schwarz, 1978) and Integrated Completed Likelihood (ICL) (Biernacki et al., 2000) using synthetic data. The obtained results illustrate the capacity of the proposed algorithm to attain the true number of cluster while outperforming BIC and ICL since it is faster, which is especially relevant when dealing with large data sets.