920 resultados para Modular products


Relevância:

20.00% 20.00%

Publicador:

Resumo:

Inducing rules from very large datasets is one of the most challenging areas in data mining. Several approaches exist to scaling up classification rule induction to large datasets, namely data reduction and the parallelisation of classification rule induction algorithms. In the area of parallelisation of classification rule induction algorithms most of the work has been concentrated on the Top Down Induction of Decision Trees (TDIDT), also known as the ‘divide and conquer’ approach. However powerful alternative algorithms exist that induce modular rules. Most of these alternative algorithms follow the ‘separate and conquer’ approach of inducing rules, but very little work has been done to make the ‘separate and conquer’ approach scale better on large training data. This paper examines the potential of the recently developed blackboard based J-PMCRI methodology for parallelising modular classification rule induction algorithms that follow the ‘separate and conquer’ approach. A concrete implementation of the methodology is evaluated empirically on very large datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Prism family of algorithms induces modular classification rules which, in contrast to decision tree induction algorithms, do not necessarily fit together into a decision tree structure. Classifiers induced by Prism algorithms achieve a comparable accuracy compared with decision trees and in some cases even outperform decision trees. Both kinds of algorithms tend to overfit on large and noisy datasets and this has led to the development of pruning methods. Pruning methods use various metrics to truncate decision trees or to eliminate whole rules or single rule terms from a Prism rule set. For decision trees many pre-pruning and postpruning methods exist, however for Prism algorithms only one pre-pruning method has been developed, J-pruning. Recent work with Prism algorithms examined J-pruning in the context of very large datasets and found that the current method does not use its full potential. This paper revisits the J-pruning method for the Prism family of algorithms and develops a new pruning method Jmax-pruning, discusses it in theoretical terms and evaluates it empirically.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The Prism family of algorithms induces modular classification rules in contrast to the Top Down Induction of Decision Trees (TDIDT) approach which induces classification rules in the intermediate form of a tree structure. Both approaches achieve a comparable classification accuracy. However in some cases Prism outperforms TDIDT. For both approaches pre-pruning facilities have been developed in order to prevent the induced classifiers from overfitting on noisy datasets, by cutting rule terms or whole rules or by truncating decision trees according to certain metrics. There have been many pre-pruning mechanisms developed for the TDIDT approach, but for the Prism family the only existing pre-pruning facility is J-pruning. J-pruning not only works on Prism algorithms but also on TDIDT. Although it has been shown that J-pruning produces good results, this work points out that J-pruning does not use its full potential. The original J-pruning facility is examined and the use of a new pre-pruning facility, called Jmax-pruning, is proposed and evaluated empirically. A possible pre-pruning facility for TDIDT based on Jmax-pruning is also discussed.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The primary objective was to determine fatty acid composition of skinless chicken breast and leg meat portions and chicken burgers and nuggets from the economy price range, standard price range (both conventional intensive rearing) and the organic range from four leading supermarkets. Few significant differences in the SFA, MUFA and PUFA composition of breast and leg meat portions were found among price ranges, and supermarket had no effect. No significant differences in fatty acid concentrations of economy and standard chicken burgers were found, whereas economy chicken nuggets had higher C16:1, C18:1 cis, C18:1 trans and C18:3 n-3 concentrations than had standard ones. Overall, processed chicken products had much higher fat contents and SFA than had whole meat. Long chain n-3 fatty acids had considerably lower concentrations in processed products than in whole meat. Overall there was no evidence that organic chicken breast or leg meat had a more favourable fatty acid composition than had meat from conventionally reared birds.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

This study investigates the child (L1) acquisition of properties at the interfaces of morpho-syntax, syntax-semantics and syntax-pragmatics, by focusing on inflected infinitives in European Portuguese (EP). Three child groups were tested, 6–7-year-olds, 9–10-year-olds and 11–12-year-olds, as well as an adult control group. The data demonstrate that children as young as 6 have knowledge of the morpho-syntactic properties of inflected infinitives, although they seem at first glance to show partially insufficient knowledge of their syntax–semantic interface properties (i.e. non-obligatory control properties), differently from children aged 9 and older, who show clearer evidence of knowledge of both types of properties. However, in general, both morpho-syntactic and syntax–semantics interface properties are also accessible to 6–7-year-old children, although these children give preference to a range of interpretations partially different from the adults; in certain cases, they may not appeal to certain pragmatic inferences that permit additional interpretations to adults and older children. Crucially, our data demonstrate that EP children master the two types of properties of inflected infinitives years before Brazilian Portuguese children do (Pires and Rothman, 2009a and Pires and Rothman, 2009b), reasons for and implications of which we discuss in detail.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Global NDVI data are routinely derived from the AVHRR, SPOT-VGT, and MODIS/Terra earth observation records for a range of applications from terrestrial vegetation monitoring to climate change modeling. This has led to a substantial interest in the harmonization of multisensor records. Most evaluations of the internal consistency and continuity of global multisensor NDVI products have focused on time-series harmonization in the spectral domain, often neglecting the spatial domain. We fill this void by applying variogram modeling (a) to evaluate the differences in spatial variability between 8-km AVHRR, 1-km SPOT-VGT, and 1-km, 500-m, and 250-m MODIS NDVI products over eight EOS (Earth Observing System) validation sites, and (b) to characterize the decay of spatial variability as a function of pixel size (i.e. data regularization) for spatially aggregated Landsat ETM+ NDVI products and a real multisensor dataset. First, we demonstrate that the conjunctive analysis of two variogram properties – the sill and the mean length scale metric – provides a robust assessment of the differences in spatial variability between multiscale NDVI products that are due to spatial (nominal pixel size, point spread function, and view angle) and non-spatial (sensor calibration, cloud clearing, atmospheric corrections, and length of multi-day compositing period) factors. Next, we show that as the nominal pixel size increases, the decay of spatial information content follows a logarithmic relationship with stronger fit value for the spatially aggregated NDVI products (R2 = 0.9321) than for the native-resolution AVHRR, SPOT-VGT, and MODIS NDVI products (R2 = 0.5064). This relationship serves as a reference for evaluation of the differences in spatial variability and length scales in multiscale datasets at native or aggregated spatial resolutions. The outcomes of this study suggest that multisensor NDVI records cannot be integrated into a long-term data record without proper consideration of all factors affecting their spatial consistency. Hence, we propose an approach for selecting the spatial resolution, at which differences in spatial variability between NDVI products from multiple sensors are minimized. This approach provides practical guidance for the harmonization of long-term multisensor datasets.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

NOAA's National Environmental Satellite, Data, and Information Service (NESDIS) has generated sea surface temperature (SST) products from Geostationary Operational Environmental Satellite (GOES)-East (E) and GOES-West (W) on an operational basis since December 2000. Since that time, a process of continual development has produced steady improvements in product accuracy. Recent improvements extended the capability to permit generation of operational SST retrievals from the Japanese Multifunction Transport Satellite (MTSAT)-1R and the European Meteosat Second Generation (MSG) satellite, thereby extending spatial coverage. The four geostationary satellites (at longitudes of 75°W, 135°W, 140°E, and 0°) provide high temporal SST retrievals for most of the tropics and midlatitudes, with the exception of a region between 60° and 80°E. Because of ongoing development, the quality of these retrievals now approaches that of SST products from the polar-orbiting Advanced Very High Resolution Radiometer (AVHRR). These products from GOES provide hourly regional imagery, 3-hourly hemispheric imagery, 24-h merged composites, a GOES SST level 2 preprocessed product every 1/2 h for each hemisphere, and a match-up data file for each product. The MTSAT and the MSG products include hourly, 3-hourly, and 24-h merged composites. These products provide the user community with a reliable source of SST observations, with improved accuracy and increased coverage in important oceanographic, meteorological, and climatic regions.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Aim Earth observation (EO) products are a valuable alternative to spectral vegetation indices. We discuss the availability of EO products for analysing patterns in macroecology, particularly related to vegetation, on a range of spatial and temporal scales. Location Global. Methods We discuss four groups of EO products: land cover/cover change, vegetation structure and ecosystem productivity, fire detection, and digital elevation models. We address important practical issues arising from their use, such as assumptions underlying product generation, product accuracy and product transferability between spatial scales. We investigate the potential of EO products for analysing terrestrial ecosystems. Results Land cover, productivity and fire products are generated from long-term data using standardized algorithms to improve reliability in detecting change of land surfaces. Their global coverage renders them useful for macroecology. Their spatial resolution (e.g. GLOBCOVER vegetation, 300 m; MODIS vegetation and fire, ≥ 500 m; ASTER digital elevation, 30 m) can be a limiting factor. Canopy structure and productivity products are based on physical approaches and thus are independent of biome-specific calibrations. Active fire locations are provided in near-real time, while burnt area products show actual area burnt by fire. EO products can be assimilated into ecosystem models, and their validation information can be employed to calculate uncertainties during subsequent modelling. Main conclusions Owing to their global coverage and long-term continuity, EO end products can significantly advance the field of macroecology. EO products allow analyses of spatial biodiversity, seasonal dynamics of biomass and productivity, and consequences of disturbances on regional to global scales. Remaining drawbacks include inter-operability between products from different sensors and accuracy issues due to differences between assumptions and models underlying the generation of different EO products. Our review explains the nature of EO products and how they relate to particular ecological variables across scales to encourage their wider use in ecological applications.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

The self-assembly of three cosmetically active peptide amphiphiles C16-GHK, C16-KT, and C16-KTTKS (C16 denotes a hexadecyl, palmitoyl chain) used in commercial skin care products is examined. A range of spectroscopic, microscopic, and X-ray scattering methods is used to probe the secondary structure, aggregate morphology, and the nanostructure. Peptide amphiphile (PA) C16-KTTKS forms flat tapes and extended fibrillar structures with high β-sheet content. In contrast, C16-KT and C16-GHK exhibit crystal-like aggregates with, in the case of the latter PA, lower β-sheet content. All three PA samples show spacings from bilayer structures in small-angle X-ray scattering profiles, and all three have similar critical aggregation concentrations, this being governed by the lipid chain length. However, only C16-KTTKS is stained by Congo red, a diagnostic dye used to detect amyloid formation, and this PA also shows a highly aligned cross-β X-ray diffraction pattern consistent with the high β-sheet content in the self-assembled aggregates. These findings may provide important insights relevant to the role of self-assembled aggregates on the reported collagen-stimulating properties of these PAs.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Pine wood and barley straw biochar amendments to Kettering and Cameroon sandy silt loam soils (15, 30, or 150 mg biochar g−1 soil) caused significant reductions (up to 80%,