957 resultados para Imbalanced datasets
Resumo:
Current methods for initialising coupled atmosphere-ocean forecasts often rely on the use of separate atmosphere and ocean analyses, the combination of which can leave the coupled system imbalanced at the beginning of the forecast, potentially accelerating the development of errors. Using a series of experiments with the European Centre for Medium-range Weather Forecasts coupled system, the magnitude and extent of these so-called initialisation shocks is quantified, and their impact on forecast skill measured. It is found that forecasts initialised by separate ocean and atmospheric analyses do exhibit initialisation shocks in lower atmospheric temperature, when compared to forecasts initialised using a coupled data assimilation method. These shocks result in as much as a doubling of root-mean-square error on the first day of the forecast in some regions, and in increases that are sustained for the duration of the 10-day forecasts performed here. However, the impacts of this choice of initialisation on forecast skill, assessed using independent datasets, were found to be negligible, at least over the limited period studied. Larger initialisation shocks are found to follow a change in either the atmospheric or ocean model component between the analysis and forecast phases: changes in the ocean component can lead to sea surface temperature shocks of more than 0.5K in some equatorial regions during the first day of the forecast. Implications for the development of coupled forecast systems, particularly with respect to coupled data assimilation methods, are discussed.
Resumo:
Aims. Although the time of the Maunder minimum (1645–1715) is widely known as a period of extremely low solar activity, it is still being debated whether solar activity during that period might have been moderate or even higher than the current solar cycle (number 24). We have revisited all existing evidence and datasets, both direct and indirect, to assess the level of solar activity during the Maunder minimum. Methods. We discuss the East Asian naked-eye sunspot observations, the telescopic solar observations, the fraction of sunspot active days, the latitudinal extent of sunspot positions, auroral sightings at high latitudes, cosmogenic radionuclide data as well as solar eclipse observations for that period. We also consider peculiar features of the Sun (very strong hemispheric asymmetry of the sunspot location, unusual differential rotation and the lack of the K-corona) that imply a special mode of solar activity during the Maunder minimum. Results. The level of solar activity during the Maunder minimum is reassessed on the basis of all available datasets. Conclusions. We conclude that solar activity was indeed at an exceptionally low level during the Maunder minimum. Although the exact level is still unclear, it was definitely lower than during the Dalton minimum of around 1800 and significantly below that of the current solar cycle #24. Claims of a moderate-to-high level of solar activity during the Maunder minimum are rejected with a high confidence level.
Resumo:
This paper presents the two datasets (ARENA and P5) and the challenge that form a part of the PETS 2015 workshop. The datasets consist of scenarios recorded by us- ing multiple visual and thermal sensors. The scenarios in ARENA dataset involve different staged activities around a parked vehicle in a parking lot in UK and those in P5 dataset involve different staged activities around the perimeter of a nuclear power plant in Sweden. The scenarios of each dataset are grouped into ‘Normal’, ‘Warning’ and ‘Alarm’ categories. The Challenge specifically includes tasks that account for different steps in a video understanding system: Low-Level Video Analysis (object detection and tracking), Mid-Level Video Analysis (‘atomic’ event detection) and High-Level Video Analysis (‘complex’ event detection). The evaluation methodology used for the Challenge includes well-established measures.
Resumo:
This paper presents a quantitative evaluation of a tracking system on PETS 2015 Challenge datasets using well-established performance measures. Using the existing tools, the tracking system implements an end-to-end pipeline that include object detection, tracking and post- processing stages. The evaluation results are presented on the provided sequences of both ARENA and P5 datasets of PETS 2015 Challenge. The results show an encouraging performance of the tracker in terms of accuracy but a greater tendency of being prone to cardinality error and ID changes on both datasets. Moreover, the analysis show a better performance of the tracker on visible imagery than on thermal imagery.
Resumo:
Datasets containing information to locate and identify water bodies have been generated from data locating static-water-bodies with resolution of about 300 m (1/360 deg) recently released by the Land Cover Climate Change Initiative (LC CCI) of the European Space Agency. The LC CCI water-bodies dataset has been obtained from multi-temporal metrics based on time series of the backscattered intensity recorded by ASAR on Envisat between 2005 and 2010. The new derived datasets provide coherently: distance to land, distance to water, water-body identifiers and lake-centre locations. The water-body identifier dataset locates the water bodies assigning the identifiers of the Global Lakes and Wetlands Database (GLWD), and lake centres are defined for in-land waters for which GLWD IDs were determined. The new datasets therefore link recent lake/reservoir/wetlands extent to the GLWD, together with a set of coordinates which locates unambiguously the water bodies in the database. Information on distance-to-land for each water cell and the distance-to-water for each land cell has many potential applications in remote sensing, where the applicability of geophysical retrieval algorithms may be affected by the presence of water or land within a satellite field of view (image pixel). During the generation and validation of the datasets some limitations of the GLWD database and of the LC CCI water-bodies mask have been found. Some examples of the inaccuracies/limitations are presented and discussed. Temporal change in water-body extent is common. Future versions of the LC CCI dataset are planned to represent temporal variation, and this will permit these derived datasets to be updated.
Resumo:
There is an increasing interest in the application of Evolutionary Algorithms (EAs) to induce classification rules. This hybrid approach can benefit areas where classical methods for rule induction have not been very successful. One example is the induction of classification rules in imbalanced domains. Imbalanced data occur when one or more classes heavily outnumber other classes. Frequently, classical machine learning (ML) classifiers are not able to learn in the presence of imbalanced data sets, inducing classification models that always predict the most numerous classes. In this work, we propose a novel hybrid approach to deal with this problem. We create several balanced data sets with all minority class cases and a random sample of majority class cases. These balanced data sets are fed to classical ML systems that produce rule sets. The rule sets are combined creating a pool of rules and an EA is used to build a classifier from this pool of rules. This hybrid approach has some advantages over undersampling, since it reduces the amount of discarded information, and some advantages over oversampling, since it avoids overfitting. The proposed approach was experimentally analysed and the experimental results show an improvement in the classification performance measured as the area under the receiver operating characteristics (ROC) curve.
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Zones of mixing between shallow groundwaters of different composition were unravelled by two-way regionalized classification, a technique based on correspondence analysis (CA), cluster analysis (ClA) and discriminant analysis (DA), aided by gridding, map-overlay and contouring tools. The shallow groundwaters are from a granitoid plutonite in the Funda o region (central Portugal). Correspondence analysis detected three natural clusters in the working dataset: 1, weathering; 2, domestic effluents; 3, fertilizers. Cluster analysis set an alternative distribution of the samples by the three clusters. Group memberships obtained by correspondence analysis and by cluster analysis were optimized by discriminant analysis, gridded memberships as follows: codes 1, 2 or 3 were used when classification by correspondence analysis and cluster analysis produced the same results; code 0 when the grid node was first assigned to cluster 1 and then to cluster 2 or vice versa (mixing between weathering and effluents); code 4 in the other cases (mixing between agriculture and the other influences). Code-3 areas were systematically surrounded by code-4 areas, an observation attributed to hydrodynamic dispersion. Accordingly, the extent of code-4 areas in two orthogonal directions was assumed proportional to the longitudinal and transverse dispersivities of local soils. The results (0.7-16.8 and 0.4-4.3 m, respectively) are acceptable at the macroscopic scale. The ratios between longitudinal and transverse dispersivities (1.2-11.1) are also in agreement with results obtained by other studies.
Resumo:
Traditional pattern recognition techniques can not handle the classification of large datasets with both efficiency and effectiveness. In this context, the Optimum-Path Forest (OPF) classifier was recently introduced, trying to achieve high recognition rates and low computational cost. Although OPF was much faster than Support Vector Machines for training, it was slightly slower for classification. In this paper, we present the Efficient OPF (EOPF), which is an enhanced and faster version of the traditional OPF, and validate it for the automatic recognition of white matter and gray matter in magnetic resonance images of the human brain. © 2010 IEEE.
Resumo:
In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.
Resumo:
The main purpose of this work is to report the presence of spurious discontinuities in the pattern of diurnal variation of sea level pressure of the three reanalysis datasets from: the National Centers for Environmental Prediction (NCEP) and National Center for Atmospheric Science (R1), the NCEP and Department of Energy (R2), and the European Centre for Medium Range Weather Forecasting (ERA-40). Such discontinuities can be connected to the major changes in the global observing system that have occurred throughout reanalyses years. In the R1, the richest period in discontinuities is 1956-1958, coinciding with the start of modern radiosonde observation network. Rapid increase in the density of surface-based observations from 1967 also had an important impact on both R1 and ERA-40, with larger impact on R1. The reanalyses show discontinuities in the 1970s related to the assimilation of radiances measured by the Vertical Temperature Profile Radiometer and TIROS-N Operational Vertical Sounders onboard satellites. In the ERA-40, which additionally assimilated Special Sensor Microwave/Imager data, there are discontinuities in 1987-1989. The R1 also presents further discontinuities, in 1988-1993 likely connected to replacement/introduction of NOAA-series satellites with different biases, and to the volcanic eruption of Mount Pinatubo in June 1991, which is known to have severely affected measurements of infrared radiances for several years. The discontinuities in 1996-1998 might be partially connected to change in the type of radiosonde, from VIZ-B to VIZ-B2. The R2, which covers only satellite era (1979-on), shows discontinuities mainly in 1992, 1996-1997, and 2001. The discontinuities in 1992 and 2001 might have been caused by change in the satellite measurements and those in 1996-1997 by some changes in land-based observations network. © 2012 Springer-Verlag.
Resumo:
We consider three-body systems in two dimensions with zero-range interactions for general masses and interaction strengths. The momentum-space Schrödinger equation is solved numerically and in the Born-Oppenheimer (BO) approximation. The BO expression is derived using separable potentials and yields a concise adiabatic potential between the two heavy particles. The BO potential is Coulomb-like and exponentially decreasing at small and large distances, respectively. While we find similar qualitative features to previous studies, we find important quantitative differences. Our results demonstrate that mass-imbalanced systems that are accessible in the field of ultracold atomic gases can have a rich three-body bound state spectrum in two-dimensional geometries. Small light-heavy mass ratios increase the number of bound states. For 87Rb-87Rb-6Li and 133Cs- 133Cs-6Li we find respectively three and four bound states. © 2013 IOP Publishing Ltd.
Resumo:
Includes bibliography
Resumo:
As part of ongoing efforts to strengthen the statistical capacities of National Statistical Offices (NSOs) in the region, the Economic Commission for Latin America and the Caribbean (ECLAC) convened a two-day Regional Training Workshop on Data Sharing, Data Ownership and Harmonization of Survey Datasets on 26-27 August 2009 at the Cascadia Hotel, Trinidad and Tobago. This workshop was one of the concluding activities of the Project on Improving Household Surveys in the Caribbean which has been implemented by the ECLAC Subregional office from 2007.