927 resultados para TRAINING SET
Resumo:
Thecamoebians were examined from 123 surface sediment samples collected from 45 lakes in the Greater Toronto Area (GTA) and the surrounding region to i) elucidate the controls on faunal distribution in modern lake environments; and ii) to consider the utility of thecamoebians in quantitative studies of water quality change. This area was chosen because it includes a high density of lakes that are threatened by urban development and where water quality has deteriorated locally as a result of contaminant inputs, particularly nutrients. Canonical Correspondence analysis (CCA) and a series of partial CCAs were used to examine species-environment relationships. Twenty-four environmental variables were considered, including water properties (e.g. pH, DO, conductivity), substrate characteristics, nutrient loading, and environmentally available metals. The thecamoebian assemblages showed a strong association with Olsen's Phosphorus, reflecting the eutrophic status of many of the lakes, and locally to elevated conductivity measurements, which appear to reflect road salt inputs associated with winter de-icing operations. A transfer function was developed for Olsen P using this training set based on weighted averaging with inverse deshrinking (WA Inv). The model was applied to infer past changes in Phosphorus enrichment in core samples from several lakes, including eutrophic Haynes Lake within the GTA. Thecamoebian-inferred changes in sedimentary Phosphorus from a 210Pb dated core from Haynes Lake are related to i) widespread introduction of chemical fertilizers to agricultural land in the post WWII era; ii) a steep decline in Phosphorous with a change in agricultural practices in the late 1970s; and iii) the construction of a golf course in close proximity to the lake in the early 1990s. This preliminary study confirms that thecamoebians have considerable potential as indicators of eutrophication in lakes and can provide an estimate of baseline conditions.
Resumo:
An algorithm based only on the impedance cardiogram (ICG) recorded through two defibrillation pads, using the strongest frequency component and amplitude, incorporated into a defibrillator could determine circulatory arrest and reduce delays in starting cardiopulmonary resuscitation (CPR). Frequency analysis of the ICG signal is carried out by integer filters on a sample by sample basis. They are simpler, lighter and more versatile when compared to the FFT. This alternative approach, although less accurate, is preferred due to the limited processing capacity of devices that could compromise real time usability of the FFT. These two techniques were compared across a data set comprising 13 cases of cardiac arrest and 6 normal controls. The best filters were refined on this training set and an algorithm for the detection of cardiac arrest was trained on a wider data set. The algorithm was finally tested on a validation set. The ICG was recorded in 132 cardiac arrest patients (53 training, 79 validation) and 97 controls (47 training, 50 validation): the diagnostic algorithm indicated cardiac arrest with a sensitivity of 81.1% (77.6-84.3) and specificity of 97.1% (96.7-97.4) for the validation set (95% confidence intervals). Automated defibrillators with integrated ICG analysis have the potential to improve emergency care by lay persons enabling more rapid and appropriate initiation of CPR and when combined with ECG analysis they could improve on the detection of cardiac arrest.
Resumo:
In this paper we present TANC, i.e., a tree-augmented naive credal classifier based on imprecise probabilities; it models prior near-ignorance via the Extreme Imprecise Dirichlet Model (EDM) (Cano et al., 2007) and deals conservatively with missing data in the training set, without assuming them to be missing-at-random. The EDM is an approximation of the global Imprecise Dirichlet Model (IDM), which considerably simplifies the computation of upper and lower probabilities; yet, having been only recently introduced, the quality of the provided approximation needs still to be verified. As first contribution, we extensively compare the output of the naive credal classifier (one of the few cases in which the global IDM can be exactly implemented) when learned with the EDM and the global IDM; the output of the classifier appears to be identical in the vast majority of cases, thus supporting the adoption of the EDM in real classification problems. Then, by experiments we show that TANC is more reliable than the precise TAN (learned with uniform prior), and also that it provides better performance compared to a previous (Zaffalon, 2003) TAN model based on imprecise probabilities. TANC treats missing data by considering all possible completions of the training set, but avoiding an exponential increase of the computational times; eventually, we present some preliminary results with missing data.
Resumo:
Efficient identification and follow-up of astronomical transients is hindered by the need for humans to manually select promising candidates from data streams that contain many false positives. These artefacts arise in the difference images that are produced by most major ground-based time-domain surveys with large format CCD cameras. This dependence on humans to reject bogus detections is unsustainable for next generation all-sky surveys and significant effort is now being invested to solve the problem computationally. In this paper, we explore a simple machine learning approach to real-bogus classification by constructing a training set from the image data of similar to 32 000 real astrophysical transients and bogus detections from the Pan-STARRS1 Medium Deep Survey. We derive our feature representation from the pixel intensity values of a 20 x 20 pixel stamp around the centre of the candidates. This differs from previous work in that it works directly on the pixels rather than catalogued domain knowledge for feature design or selection. Three machine learning algorithms are trained (artificial neural networks, support vector machines and random forests) and their performances are tested on a held-out subset of 25 per cent of the training data. We find the best results from the random forest classifier and demonstrate that by accepting a false positive rate of 1 per cent, the classifier initially suggests a missed detection rate of around 10 per cent. However, we also find that a combination of bright star variability, nuclear transients and uncertainty in human labelling means that our best estimate of the missed detection rate is approximately 6 per cent.
Resumo:
Tumor recurrence after curative resection remains a major problem in patients with locally advanced colorectal cancer treated with adjuvant chemotherapy. Genetic single-nucleotide polymorphisms (SNP) may serve as useful molecular markers to predict clinical outcomes in these patients and identify targets for future drug development. Recent in vitro and in vivo studies have demonstrated that the plastin genes PLS3 and LCP1 are overexpressed in colon cancer cells and play an important role in tumor cell invasion, adhesion, and migration. Hence, we hypothesized that functional genetic variations of plastin may have direct effects on the progression and prognosis of locally advanced colorectal cancer. We tested whether functional tagging polymorphisms of PLS3 and LCP1 predict time to tumor recurrence (TTR) in 732 patients (training set, 234; validation set, 498) with stage II/III colorectal cancer. The PLS3 rs11342 and LCP1 rs4941543 polymorphisms were associated with a significantly increased risk for recurrence in the training set. PLS3 rs6643869 showed a consistent association with TTR in the training and validation set, when stratified by gender and tumor location. Female patients with the PLS3 rs6643869 AA genotype had the shortest median TTR compared with those with any G allele in the training set [1.7 vs. 9.4 years; HR, 2.84; 95% confidence interval (CI), 1.32-6.1; P = 0.005] and validation set (3.3 vs. 13.7 years; HR, 2.07; 95% CI, 1.09-3.91; P = 0.021). Our findings suggest that several SNPs of the PLS3 and LCP1 genes could serve as gender- and/or stage-specific molecular predictors of tumor recurrence in stage II/III patients with colorectal cancer as well as potential therapeutic targets.
Resumo:
Conventional practice in Regional Geochemistry includes as a final step of any geochemical campaign the generation of a series of maps, to show the spatial distribution of each of the components considered. Such maps, though necessary, do not comply with the compositional, relative nature of the data, which unfortunately make any conclusion based on them sensitive
to spurious correlation problems. This is one of the reasons why these maps are never interpreted isolated. This contribution aims at gathering a series of statistical methods to produce individual maps of multiplicative combinations of components (logcontrasts), much in the flavor of equilibrium constants, which are designed on purpose to capture certain aspects of the data.
We distinguish between supervised and unsupervised methods, where the first require an external, non-compositional variable (besides the compositional geochemical information) available in an analogous training set. This external variable can be a quantity (soil density, collocated magnetics, collocated ratio of Th/U spectral gamma counts, proportion of clay particle fraction, etc) or a category (rock type, land use type, etc). In the supervised methods, a regression-like model between the external variable and the geochemical composition is derived in the training set, and then this model is mapped on the whole region. This case is illustrated with the Tellus dataset, covering Northern Ireland at a density of 1 soil sample per 2 square km, where we map the presence of blanket peat and the underlying geology. The unsupervised methods considered include principal components and principal balances
(Pawlowsky-Glahn et al., CoDaWork2013), i.e. logcontrasts of the data that are devised to capture very large variability or else be quasi-constant. Using the Tellus dataset again, it is found that geological features are highlighted by the quasi-constant ratios Hf/Nb and their ratio against SiO2; Rb/K2O and Zr/Na2O and the balance between these two groups of two variables; the balance of Al2O3 and TiO2 vs. MgO; or the balance of Cr, Ni and Co vs. V and Fe2O3. The largest variability appears to be related to the presence/absence of peat.
Resumo:
In previous papers from the authors fuzzy model identification methods were discussed. The bacterial algorithm for extracting fuzzy rule base from a training set was presented. The Levenberg-Marquardt algorithm was also proposed for determining membership functions in fuzzy systems. In this paper the Levenberg-Marquardt technique is improved to optimise the membership functions in the fuzzy rules without Ruspini-partition. The class of membership functions investigated is the trapezoidal one as it is general enough and widely used. The method can be easily extended to arbitrary piecewise linear functions as well.
Resumo:
This experimental study focuses on a detection system at the seismic station level that should have a similar role to the detection algorithms based on the ratio STA/LTA. We tested two types of neural network: Multi-Layer Perceptrons and Support Vector Machines, trained in supervised mode. The universe of data consisted of 2903 patterns extracted from records of the PVAQ station, of the seismography network of the Institute of Meteorology of Portugal. The spectral characteristics of the records and its variation in time were reflected in the input patterns, consisting in a set of values of power spectral density in selected frequencies, extracted from a spectro gram calculated over a segment of record of pre-determined duration. The universe of data was divided, with about 60% for the training and the remainder reserved for testing and validation. To ensure that all patterns in the universe of data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. The best results, in terms of sensitivity and selectivity in the whole data ranged between 98% and 100%. These results compare very favorably with the ones obtained by the existing detection system, 50%.
Resumo:
This paper presents a method of using the so-colled "bacterial algorithm" (4,5) for extracting a fuzzy rule base from a training set. The bewly proposed bacterial evolutionary algorithm (BEA) is shown. In our application one bacterium corresponds to a fuzzy rule system.
Resumo:
This study describes the on-line operation of a seismic detection system to act at the level of a seismic station providing similar role to that of a STA /LTA ratio-based detection algorithms. The intelligent detector is a Support Vector Machine (SVM), trained with data consisting of 2903 patterns extracted from records of the PVAQ station, one of the seismographic network's stations of the Institute of Meteorology of Portugal (IM). Records' spectral variations in time and characteristics were reflected in the SVM input patterns, as a set of values of power spectral density at selected frequencies. To ensure that all patterns of the sample data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. After having been trained, the proposed system was experimented in continuous operation for unseen (out of sample) data, and the SVM detector obtained 97.7% and 98.7% of sensitivity and selectivity, respectively. The same type of ANN presented 88.4 % and 99.4% of sensitivity and selectivity when applied to data of a different seismic station of IM. © 2013 Springer-Verlag Berlin Heidelberg.
Resumo:
Senior thesis written for Oceanography 445
Resumo:
In recent years, power systems have experienced many changes in their paradigm. The introduction of new players in the management of distributed generation leads to the decentralization of control and decision-making, so that each player is able to play in the market environment. In the new context, it will be very relevant that aggregator players allow midsize, small and micro players to act in a competitive environment. In order to achieve their objectives, virtual power players and single players are required to optimize their energy resource management process. To achieve this, it is essential to have financial resources capable of providing access to appropriate decision support tools. As small players have difficulties in having access to such tools, it is necessary that these players can benefit from alternative methodologies to support their decisions. This paper presents a methodology, based on Artificial Neural Networks (ANN), and intended to support smaller players. In this case the present methodology uses a training set that is created using energy resource scheduling solutions obtained using a mixed-integer linear programming (MIP) approach as the reference optimization methodology. The trained network is used to obtain locational marginal prices in a distribution network. The main goal of the paper is to verify the accuracy of the ANN based approach. Moreover, the use of a single ANN is compared with the use of two or more ANN to forecast the locational marginal price.
Resumo:
The future scenarios for operation of smart grids are likely to include a large diversity of players, of different types and sizes. With control and decision making being decentralized over the network, intelligence should also be decentralized so that every player is able to play in the market environment. In the new context, aggregator players, enabling medium, small, and even micro size players to act in a competitive environment, will be very relevant. Virtual Power Players (VPP) and single players must optimize their energy resource management in order to accomplish their goals. This is relatively easy to larger players, with financial means to have access to adequate decision support tools, to support decision making concerning their optimal resource schedule. However, the smaller players have difficulties in accessing this kind of tools. So, it is required that these smaller players can be offered alternative methods to support their decisions. This paper presents a methodology, based on Artificial Neural Networks (ANN), intended to support smaller players’ resource scheduling. The used methodology uses a training set that is built using the energy resource scheduling solutions obtained with a reference optimization methodology, a mixed-integer non-linear programming (MINLP) in this case. The trained network is able to achieve good schedule results requiring modest computational means.
Resumo:
O presente relatório reporta o trabalho desenvolvido durante o estágio curricular enquadrado no ciclo de estudos do Mestrado em Engenharia Civil do Instituto Superior de Engenharia do Porto – ISEP. A duração do estágio curricular, inserido em ambiente empresarial, compreendeu um período de seis meses, tendo início em Fevereiro e conclusão no mês de Julho de 2015, decorrendo na empresa ENESCOORD – Coordenação e Gestão de Projetos e Obras, Lda., com o intuito de obter o grau de Mestre em Engenharia Civil – ramo de Gestão da Construção. No presente relatório, faz-se o enquadramento do mesmo, bem como a apresentação da empresa onde foi realizado o estágio curricular. De forma a percecionar as obrigações legais a que está sujeita a fiscalização, aborda-se a legislação aplicável às funções da fiscalização no decorrer de uma obra. O presente documento aborda a temática da Fiscalização e as suas responsabilidades durante a obra, apresentando o trabalho desenvolvido pelo estagiário enquanto membro integrante da equipa de fiscalização. Durante o estágio desenvolveu o trabalho de engenheiro fiscal estagiário na empreitada da SOGENAVE que consistiu na ampliação das instalações da empresa, com o intuito de aumentar o espaço de armazenamento e agrupar as empresas do grupo Trivalor, onde se insere a SOGENAVE. O relatório aborda as áreas de controlo da fiscalização aplicadas durante a obra, analisando com ênfase o controlo de qualidade, de custos, de prazos, de segurança e das alterações introduzidas no projeto. Por fim, enunciam-se algumas considerações em jeito de conclusão inerentes ao trabalho desenvolvido.
Resumo:
Ship tracking systems allow Maritime Organizations that are concerned with the Safety at Sea to obtain information on the current location and route of merchant vessels. Thanks to Space technology in recent years the geographical coverage of the ship tracking platforms has increased significantly, from radar based near-shore traffic monitoring towards a worldwide picture of the maritime traffic situation. The long-range tracking systems currently in operations allow the storage of ship position data over many years: a valuable source of knowledge about the shipping routes between different ocean regions. The outcome of this Master project is a software prototype for the estimation of the most operated shipping route between any two geographical locations. The analysis is based on the historical ship positions acquired with long-range tracking systems. The proposed approach makes use of a Genetic Algorithm applied on a training set of relevant ship positions extracted from the long-term storage tracking database of the European Maritime Safety Agency (EMSA). The analysis of some representative shipping routes is presented and the quality of the results and their operational applications are assessed by a Maritime Safety expert.