957 resultados para Imbalanced datasets
Resumo:
Sediment contaminants were monitored in Milford Haven Waterway (MHW) since 1978 (hydrocarbons) and 1982 (metals), with the aim of providing surveillance of environmental quality in one of the UK’s busiest oil and gas ports. This aim is particularly important during and after large-scale investment in liquefied natural gas (LNG) facilities. However, methods inevitably have changed over the years, compounding the difficulties of coordinating sampling and analytical programmes. After a review by the MHW Environmental Surveillance Group (MHWESG), sediment hydrocarbon chemistry was investigated in detail in 2010. Natural Resources Wales (NRW) contributed their MHW data for 2007 and 2012, collected to assess the condition of the Special Area of Conservation (SAC) designated under the European Union Habitats Directive. Datasets during 2007-2012 have thus been more comparable. The results showed conclusively that a MHW-wide peak in concentrations of sediment polycyclic aromatic hydrocarbons (PAHs), metals and other contaminants occurred in late 2007. This was corroborated by independent annual monitoring at one centrally-located station with peaks in early 2008 and 2011. The spatial and temporal patterns of recovery from the 2007 peak, shown by MHW-wide surveys in 2010 and 2012, indicate several probable causes of contaminant trends, as follows: atmospheric deposition, catchment runoff, sediment resuspension from dredging, and construction of two LNG terminals and a power station. Adverse biological effects predictable in 2007 using international sediment quality guidelines, were independently tested by data from monitoring schemes of more than a decade duration in MHW (starfish, limpets), and in the wider SAC (grey seals). Although not proving cause and effect, many of these potential biological receptors showed a simultaneous negative response to the elevated 2007 contamination following intense dredging activity in 2006. Wetland bird counts were typically at a peak in the winter of 2005-2006 previous to peak dredging. In the following winter 2006-2007, shelduck in Pembroke River showed their lowest winter count, and spring 2007 was the largest ever drop in numbers of broods across MHW between successive breeding seasons. Wigeon counts in Pembroke River were again low in late 2012 after further dredging nearby. These results are strongly supported by PAH data reported previously from invertebrate bioaccumulation studies in MHW 2007-2010, themselves closely reflecting sediment
Resumo:
The impacts of various climate modes on the Red Sea surface heat exchange are investigated using the MERRA reanalysis and the OAFlux satellite reanalysis datasets. Seasonality in the atmospheric forcing is also explored. Mode impacts peak during boreal winter [December–February (DJF)] with average anomalies of 12–18 W m−2 to be found in the northern Red Sea. The North Atlantic Oscillation (NAO), the east Atlantic–west Russia (EAWR) pattern, and the Indian monsoon index (IMI) exhibit the strongest influence on the air–sea heat exchange during the winter. In this season, the largest negative anomalies of about −30 W m−2 are associated with the EAWR pattern over the central part of the Red Sea. In other seasons, mode-related anomalies are considerably lower, especially during spring when the mode impacts are negligible. The mode impacts are strongest over the northern half of the Red Sea during winter and autumn. In summer, the southern half of the basin is strongly influenced by the multivariate ENSO index (MEI). The winter mode–related anomalies are determined mostly by the latent heat flux component, while in summer the shortwave flux is also important. The influence of the modes on the Red Sea is found to be generally weaker than on the neighboring Mediterranean basin.
Resumo:
Remote sensing airborne hyperspectral data are routinely used for applications including algorithm development for satellite sensors, environmental monitoring and atmospheric studies. Single flight lines of airborne hyperspectral data are often in the region of tens of gigabytes in size. This means that a single aircraft can collect terabytes of remotely sensed hyperspectral data during a single year. Before these data can be used for scientific analyses, they need to be radiometrically calibrated, synchronised with the aircraft's position and attitude and then geocorrected. To enable efficient processing of these large datasets the UK Airborne Research and Survey Facility has recently developed a software suite, the Airborne Processing Library (APL), for processing airborne hyperspectral data acquired from the Specim AISA Eagle and Hawk instruments. The APL toolbox allows users to radiometrically calibrate, geocorrect, reproject and resample airborne data. Each stage of the toolbox outputs data in the common Band Interleaved Lines (BILs) format, which allows its integration with other standard remote sensing software packages. APL was developed to be user-friendly and suitable for use on a workstation PC as well as for the automated processing of the facility; to this end APL can be used under both Windows and Linux environments on a single desktop machine or through a Grid engine. A graphical user interface also exists. In this paper we describe the Airborne Processing Library software, its algorithms and approach. We present example results from using APL with an AISA Eagle sensor and we assess its spatial accuracy using data from multiple flight lines collected during a campaign in 2008 together with in situ surveyed ground control points.
Resumo:
Blood-brain barrier (BBB) breakdown, demonstrable in vivo by enhanced MRI is characteristic of new and expanding inflammatory lesions in relapsing remitting and chronic progressive multiple sclerosis (MS). Subtle leakage may also occur in primary progressive MS. However, the anatomical route(s) of BBB leakage have not been demonstrated. We investigated the possible involvement of interendothelial tight junctions (TJ) by examining the expression of TJ proteins (occludin and ZO-1 ) in blood vessels in active MS lesions from 8 cases of MS and in normal-appearing white (NAWM) matter from 6 cases. Blood vessels (10-50 per frozen section) were scanned using confocal laser scanning microscopy to acquire datasets for analysis. TJ abnormalities manifested as beading, interruption, absence or diffuse cytoplasmic localization of fluorescence, or separation of junctions (putative opening) were frequent (affecting 40% of vessels) in oil red-O-positive active plaques but less frequent in NAWM (15%), and in normal (
Resumo:
The definitive paper by Stuiver and Polach (1977) established the conventions for reporting of 14C data for chronological and geophysical studies based on the radioactive decay of 14C in the sample since the year of sample death or formation. Several ways of reporting 14C activity levels relative to a standard were also established, but no specific instructions were given for reporting nuclear weapons testing (post-bomb) 14C levels in samples. Because the use of post-bomb 14C is becoming more prevalent in forensics, biology, and geosciences, a convention needs to be adopted. We advocate the use of fraction modern with a new symbol F14C to prevent confusion with the previously used Fm, which may or may not have been fractionation corrected. We also discuss the calibration of post-bomb 14C samples and the available datasets and compilations, but do not give a recommendation for a particular dataset.
Resumo:
We have previously published intermediate to hi,oh resolution spectroscopic observations of approximately 80 early B-type main-sequence stars situated in 19 Galactic open clusters/associations with Galactocentric distances distributed over 6 less than or equal to R-g less than or equal to 18 kpc. This current study collates and re-analyses these equivalent- width datasets using LTE and non-LTE model atmosphere techniques, in order to determine the stellar atmospheric parameters and abundance estimates for C, N, O, Mg, Al and Si. The latter should be representative of the present-day Galactic interstellar medium. Our extensive observational dataset permits the identification of sub-samples of stars with similar atmospheric parameters and of homogeneous subsets of lines. As such, this investigation represents the most extensive and systematic study of its kind to date. We conclude that the distribution of light elements (CI O, Mg & Si) in the Galactic disk can be represented by a linear, radial gradient of -0.07 +/- 0.01 dex kpc(-1) Our results for nitrogen and oxygen viz. (-0.09 +/- 0.01 dex kpc(-1) and -0.067 +/- 0.008 dex kpc(-1)) are in excellent agreement with that found from the study of HII regions. We have also examined our datasets for evidence of an abrupt discontinuity in the metallicity of the Galactic disk near a Galactocentric distance of 10 kpc (see Twarog et al. 1997). However, there is no evidence to suggest that our data would be better fitted with a two-zone model. Moreover, we observe a N/O gradient of -0.04 +/- 0.02 dex kpc(-1) which is consistent with that found for other spiral galaxies (Vila- Costas gr Edmunds 1993).
Resumo:
Course Scheduling consists of assigning lecture events to a limited set of specific timeslots and rooms. The objective is to satisfy as many soft constraints as possible, while maintaining a feasible solution timetable. The most successful techniques to date require a compute-intensive examination of the solution neighbourhood to direct searches to an optimum solution. Although they may require fewer neighbourhood moves than more exhaustive techniques to gain comparable results, they can take considerably longer to achieve success. This paper introduces an extended version of the Great Deluge Algorithm for the Course Timetabling problem which, while avoiding the problem of getting trapped in local optima, uses simple Neighbourhood search heuristics to obtain solutions in a relatively short amount of time. The paper presents results based on a standard set of benchmark datasets, beating over half of the currently published best results with in some cases up to 60% of an improvement.
Resumo:
The analysis of chironomid taxa and environmental datasets from 46 New Zealand lakes identified temperature (February mean air temperature) and lake production (chlorophyll a (Chl a)) as the main drivers of chironomid distribution. Temperature was the strongest driver of chironomid distribution and consequently produced the most robust inference models. We present two possible temperature transfer functions from this dataset. The most robust model (weighted averaging-partial least squares (WA-PLS), n = 36) was based on a dataset with the most productive (Chl a > 10 lg l)1) lakes removed. This model produced a coefficient of determination (r2 jack) of 0.77, and a root mean squared error of prediction (RMSEPjack) of 1.31C. The Chl a transfer function (partial least squares (PLS), n = 37) was far less reliable, with an r2 jack of 0.49 and an RMSEPjack of 0.46 Log10lg l)1. Both of these transfer functions could be improved by a revision of the taxonomy for the New Zealand chironomid taxa, particularly the genus Chironomus. The Chironomus morphotype was common in high altitude, cool, oligotrophic lakes and lowland, warm, eutrophic lakes. This could reflect the widespread distribution of one eurythermic species, or the collective distribution of a number of different Chironomus species with more limited tolerances. The Chl a transfer function could also be improved by inputting mean Chl a values into the inference model rather than the spot measurements that were available for this study.
Resumo:
PURPOSE. To examine internal consistency, refine the response scale, and obtain a linear scoring system for the visual function instrument, the Daily Living Tasks Dependent on Vision (DLTV). METHODS. Data were available from 186 participants with a clinical diagnosis of AMD who completed the 22-item DLTV (DLTV-22) according to four-point ordinal response scale. An independent group of 386 participants with AMD were administered a reduced version of the DLTV with 11 items (DLTV-11), according to a five-point response scale. Rasch analysis was performed on both datasets and used to generate item statistics for measure order, response odds ratios per item and per person, and infit and outfit mean square statistics. The Rasch output from the DLTV-22 was examined to identify redundant items and for factorial validity and person item measure separation reliabilities. RESULTS. The average rating for the DLTV-22 changed monotonically with the magnitude of the latent person trait. The expected versus observed average measures were extremely close, with step calibrations evenly separated for the four-point ordinal scale. In the case of the DLTV-11, step calibrations were not as evenly separated, suggesting that the five-point scale should be reduced to either a four- or three-point scale. Five items in the DLTV-22 were removed, and all 17 remaining items had good infit and outfit mean squares. PCA with residuals from Rasch analysis identified two domains containing 7 and 10 items each. The domains had high person separation reliabilities (0.86 and 0.77 for domains 1 and 2, respectively) and item measure reliabilities (0.99 and 0.98 for domains 1 and 2, respectively). CONCLUSIONS. With the improved internal consistency, establishment of the accuracy and precision of the rating scale for the DLTV and the establishment of a valid domain structure we believe that it constitutes a useful instrument for assessing visual function in older adults with age-related macular degeneration.
Resumo:
The identification of non-linear systems using only observed finite datasets has become a mature research area over the last two decades. A class of linear-in-the-parameter models with universal approximation capabilities have been intensively studied and widely used due to the availability of many linear-learning algorithms and their inherent convergence conditions. This article presents a systematic overview of basic research on model selection approaches for linear-in-the-parameter models. One of the fundamental problems in non-linear system identification is to find the minimal model with the best model generalisation performance from observational data only. The important concepts in achieving good model generalisation used in various non-linear system-identification algorithms are first reviewed, including Bayesian parameter regularisation and models selective criteria based on the cross validation and experimental design. A significant advance in machine learning has been the development of the support vector machine as a means for identifying kernel models based on the structural risk minimisation principle. The developments on the convex optimisation-based model construction algorithms including the support vector regression algorithms are outlined. Input selection algorithms and on-line system identification algorithms are also included in this review. Finally, some industrial applications of non-linear models are discussed.
Resumo:
PEGS (Production and Environmental Generic Scheduler) is a generic production scheduler that produces good schedules over a wide range of problems. It is centralised, using search strategies with the Shifting Bottleneck algorithm. We have also developed an alternative distributed approach using software agents. In some cases this reduces run times by a factor of 10 or more. In most cases, the agent-based program also produces good solutions for published benchmark data, and the short run times make our program useful for a large range of problems. Test results show that the agents can produce schedules comparable to the best found so far for some benchmark datasets and actually better schedules than PEGS on our own random datasets. The flexibility that agents can provide for today's dynamic scheduling is also appealing. We suggest that in this sort of generic or commercial system, the agent-based approach is a good alternative.
Resumo:
Exam timetabling is one of the most important administrative activities that takes place in academic institutions. In this paper we present a critical discussion of the research on exam timetabling in the last decade or so. This last ten years has seen an increased level of attention on this important topic. There has been a range of significant contributions to the scientific literature both in terms of theoretical andpractical aspects. The main aim of this survey is to highlight the new trends and key research achievements that have been carried out in the last decade.We also aim to outline a range of relevant important research issues and challenges that have been generated by this body of work.
We first define the problem and review previous survey papers. Algorithmic approaches are then classified and discussed. These include early techniques (e.g. graph heuristics) and state-of-the-art approaches including meta-heuristics, constraint based methods, multi-criteria techniques, hybridisations, and recent new trends concerning neighbourhood structures, which are motivated by raising the generality of the approaches. Summarising tables are presented to provide an overall view of these techniques. We discuss some issues on decomposition techniques, system tools and languages, models and complexity. We also present and discuss some important issues which have come to light concerning the public benchmark exam timetabling data. Different versions of problem datasetswith the same name have been circulating in the scientific community in the last ten years which has generated a significant amount of confusion. We clarify the situation and present a re-naming of the widely studied datasets to avoid future confusion. We also highlight which research papershave dealt with which dataset. Finally, we draw upon our discussion of the literature to present a (non-exhaustive) range of potential future research directions and open issues in exam timetabling research.
Resumo:
DIN (diabetic nephropathy) is the leading cause of end-stage renal disease worldwide and develops in 25-40% of patients with Type 1 or Type 2 diabetes mellitus. Elevated blood glucose over long periods together with glomerular hypertension leads to progressive glomerulosclerosis and tubulointerstitial fibrosis in susceptible individuals. Central to the pathology of DIN are cytokines and growth factors such as TGF-beta (transforming growth factor beta) superfamily members, including BMPs (bone morphogenetic protein) and TGF-beta 1, which play key roles in fibrogenic responses of the kidney, including podocyte loss, mesangial cell hypertrophy, matrix accumulation and tubulointerstitial fibrosis. Many of these responses can be mimicked in in vitro models of cells cultured in high glucose. We have applied differential gene expression technologies to identify novel genes expressed in in vitro and in vivo models of DN and, importantly, in human renal tissue. By mining these datasets and probing the regulation of expression and actions of specific molecules, we have identified novel roles for molecules such as Gremlin, IHG-1 (induced in high glucose-1) and CTGF (connective tissue growth factor) in DIN and potential regulators of their bioactions.
Resumo:
This paper proposes a new hierarchical learning structure, namely the holistic triple learning (HTL), for extending the binary support vector machine (SVM) to multi-classification problems. For an N-class problem, a HTL constructs a decision tree up to a depth of A leaf node of the decision tree is allowed to be placed with a holistic triple learning unit whose generalisation abilities are assessed and approved. Meanwhile, the remaining nodes in the decision tree each accommodate a standard binary SVM classifier. The holistic triple classifier is a regression model trained on three classes, whose training algorithm is originated from a recently proposed implementation technique, namely the least-squares support vector machine (LS-SVM). A major novelty with the holistic triple classifier is the reduced number of support vectors in the solution. For the resultant HTL-SVM, an upper bound of the generalisation error can be obtained. The time complexity of training the HTL-SVM is analysed, and is shown to be comparable to that of training the one-versus-one (1-vs.-1) SVM, particularly on small-scale datasets. Empirical studies show that the proposed HTL-SVM achieves competitive classification accuracy with a reduced number of support vectors compared to the popular 1-vs-1 alternative.
Resumo:
Periodontitis, a chronic inflammatory disease of the tissues supporting the teeth, is characterized by an exaggerated host immune and inflammatory response to periopathogenic bacteria. Toll-like receptor activation, cytokine network induction, and accumulation of neutrophils at the site of inflammation are important in the host defense against infection. At the same time, induction of immune tolerance and the clearance of neutrophils from the site of infection are essential in the control of the immune response, resolution of inflammation, and prevention of tissue destruction. Using a human monocytic cell line, we demonstrate that Porphyromonas gingivalis lipopolysaccharide (LPS), which is a major etiological factor in periodontal disease, induces only partial immune tolerance, with continued high production of interleukin-8 (IL-8) but diminished secretion of tumor necrosis factor alpha (TNF-) after repeated challenge. This cytokine response has functional consequences for other immune cells involved in the response to infection. Primary human neutrophils incubated with P. gingivalis LPS-treated naïve monocyte supernatant displayed a high migration index and increased apoptosis. In contrast, neutrophils treated with P. gingivalis LPS-tolerized monocyte supernatant showed a high migration index but significantly decreased apoptosis. Overall, these findings suggest that induction of an imbalanced immune tolerance in monocytes by P. gingivalis LPS, which favors continued secretion of IL-8 but decreased TNF- production, may be associated with enhanced migration of neutrophils to the site of infection but also with decreased apoptosis and may play a role in the chronic inflammatory state seen in periodontal disease.