914 resultados para statistical data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Code division multiple access (CDMA) in which the spreading code assignment to users contains a random element has recently become a cornerstone of CDMA research. The random element in the construction is particularly attractive as it provides robustness and flexibility in utilizing multiaccess channels, whilst not making significant sacrifices in terms of transmission power. Random codes are generated from some ensemble; here we consider the possibility of combining two standard paradigms, sparsely and densely spread codes, in a single composite code ensemble. The composite code analysis includes a replica symmetric calculation of performance in the large system limit, and investigation of finite systems through a composite belief propagation algorithm. A variety of codes are examined with a focus on the high multi-access interference regime. We demonstrate scenarios both in the large size limit and for finite systems in which the composite code has typical performance exceeding those of sparse and dense codes at equivalent signal to noise ratio.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Objective: Recently, much research has been proposed using nature inspired algorithms to perform complex machine learning tasks. Ant colony optimization (ACO) is one such algorithm based on swarm intelligence and is derived from a model inspired by the collective foraging behavior of ants. Taking advantage of the ACO in traits such as self-organization and robustness, this paper investigates ant-based algorithms for gene expression data clustering and associative classification. Methods and material: An ant-based clustering (Ant-C) and an ant-based association rule mining (Ant-ARM) algorithms are proposed for gene expression data analysis. The proposed algorithms make use of the natural behavior of ants such as cooperation and adaptation to allow for a flexible robust search for a good candidate solution. Results: Ant-C has been tested on the three datasets selected from the Stanford Genomic Resource Database and achieved relatively high accuracy compared to other classical clustering methods. Ant-ARM has been tested on the acute lymphoblastic leukemia (ALL)/acute myeloid leukemia (AML) dataset and generated about 30 classification rules with high accuracy. Conclusions: Ant-C can generate optimal number of clusters without incorporating any other algorithms such as K-means or agglomerative hierarchical clustering. For associative classification, while a few of the well-known algorithms such as Apriori, FP-growth and Magnum Opus are unable to mine any association rules from the ALL/AML dataset within a reasonable period of time, Ant-ARM is able to extract associative classification rules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO COPYRIGHT RESTRICTIONS ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Consideration of the influence of test technique and data analysis method is important for data comparison and design purposes. The paper highlights the effects of replication interval, crack growth rate averaging and curve-fitting procedures on crack growth rate results for a Ni-base alloy. It is shown that an upper bound crack growth rate line is not appropriate for use in fatigue design, and that the derivative of a quadratic fit to the a vs N data looks promising. However, this type of averaging, or curve fitting, is not useful in developing an understanding of microstructure/crack tip interactions. For this purpose, simple replica-to-replica growth rate calculations are preferable. © 1988.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Developers of interactive software are confronted by an increasing variety of software tools to help engineer the interactive aspects of software applications. Not only do these tools fall into different categories in terms of functionality, but within each category there is a growing number of competing tools with similar, although not identical, features. Choice of user interface development tool (UIDT) is therefore becoming increasingly complex.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

DUE TO INCOMPLETE PAPERWORK, ONLY AVAILABLE FOR CONSULTATION AT ASTON UNIVERSITY LIBRARY AND INFORMATION SERVICES WITH PRIOR ARRANGEMENT

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Market research is often conducted through conventional methods such as surveys, focus groups and interviews. But the drawbacks of these methods are that they can be costly and timeconsuming. This study develops a new method, based on a combination of standard techniques like sentiment analysis and normalisation, to conduct market research in a manner that is free and quick. The method can be used in many application-areas, but this study focuses mainly on the veganism market to identify vegan food preferences in the form of a profile. Several food words are identified, along with their distribution between positive and negative sentiments in the profile. Surprisingly, non-vegan foods such as cheese, cake, milk, pizza and chicken dominate the profile, indicating that there is a significant market for vegan-suitable alternatives for such foods. Meanwhile, vegan-suitable foods such as coconut, potato, blueberries, kale and tofu also make strong appearances in the profile. Validation is performed by using the method on Volkswagen vehicle data to identify positive and negative sentiment across five car models. Some results were found to be consistent with sales figures and expert reviews, while others were inconsistent. The reliability of the method is therefore questionable, so the results should be used with caution.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The object of this report is to present the data and conclusions drawn from the analysis of the origin and destination information. Comments on the advisability and correctness of the approach used by Iowa are encouraged.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

New morpho-bathymetric and tectono-stratigraphic data on Naples and Salerno Gulfs, derived from bathymetric and seismic data analysis and integrated geologic interpretation are here presented. The CUBE(Combined Uncertainty Bathymetric Estimator) method has been applied to complex morphologies, such as the Capri continental slope and the related geological structures occurring in the Salerno Gulf.The bathymetric data analysis has been carried out for marine geological maps of the whole Campania continental margin at scales ranging from 1:25.000 to 1:10.000, including focused examples in Naples and Salerno Gulfs, Naples harbour, Capri and Ischia Islands and Salerno Valley. Seismic data analysis has allowed for the correlation of main morpho-structural lineaments recognized at a regional scale through multichannel profiles with morphological features cropping out at the sea bottom, evident from bathymetry.Main fault systems in the area have been represented on a tectonic sketch map, including the master fault located northwards to the Salerno Valley half graben. Some normal faults parallel to the master fault have been interpreted from the slope map derived from bathymetric data. A complex system of antithetic faults bound two morpho-structural highs located 20km to the south of the Capri Island. Some hints of compressional reactivation of normal faults in an extensional setting involving the whole Campania continental margin have been shown from seismic interpretation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Three types of forecasts of the total Australian production of macadamia nuts (t nut-in-shell) have been produced early each year since 2001. The first is a long-term forecast, based on the expected production from the tree census data held by the Australian Macadamia Society, suitably scaled up for missing data and assumed new plantings each year. These long-term forecasts range out to 10 years in the future, and form a basis for industry and market planning. Secondly, a statistical adjustment (termed the climate-adjusted forecast) is made annually for the coming crop. As the name suggests, climatic influences are the dominant factors in this adjustment process, however, other terms such as bienniality of bearing, prices and orchard aging are also incorporated. Thirdly, industry personnel are surveyed early each year, with their estimates integrated into a growers and pest-scouts forecast. Initially conducted on a 'whole-country' basis, these models are now constructed separately for the six main production regions of Australia, with these being combined for national totals. Ensembles or suites of step-forward regression models using biologically-relevant variables have been the major statistical method adopted, however, developing methodologies such as nearest-neighbour techniques, general additive models and random forests are continually being evaluated in parallel. The overall error rates average 14% for the climate forecasts, and 12% for the growers' forecasts. These compare with 7.8% for USDA almond forecasts (based on extensive early-crop sampling) and 6.8% for coconut forecasts in Sri Lanka. However, our somewhatdisappointing results were mainly due to a series of poor crops attributed to human reasons, which have now been factored into the models. Notably, the 2012 and 2013 forecasts averaged 7.8 and 4.9% errors, respectively. Future models should also show continuing improvement, as more data-years become available.