83 resultados para Large Data
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
The popularity of tri-axial accelerometer data loggers to quantify animal activity through the analysis of signature traces is increasing. However, there is no consensus on how to process the large data sets that these devices generate when recording at the necessary high sample rates. In addition, there have been few attempts to validate accelerometer traces with specific behaviours in non-domesticated terrestrial mammals.
Resumo:
Proper application of stable isotopes (e. g., delta N-15 and delta C-13) to food web analysis requires an understanding of all nondietary factors that contribute to isotopic variability. Lipid extraction is often used during stable isotope analysis (SIA), because synthesized lipids have a low delta C-13 and can mask the delta C-13 of a consumer's diet. Recent studies indicate that lipid extraction intended to adjust delta C-13 may also cause shifts in delta N-15, but the magnitude of and reasons for the shift are highly uncertain. We examined a large data set (n = 854) for effects of lipid extraction (using Bligh and dyer's [ 1959] chloroform-methanol solvent mixtures) on the delta N-15 of aquatic consumers. We found no effect of chemically extracting lipids on the delta N-15 of whole zooplankton, unionid mussels, and fish liver samples, and found a small increase in fish muscle delta N-15 of similar to 0.4%. We also detected a negative relationship between the shift in delta N-15 following extraction and the C:N ratio in muscle tissue, suggesting that effects of extraction were greater for tissue with lower lipid content. As long as appropriate techniques such as those from Bligh and dyer (1959) are used, effects of lipid extraction on delta N-15 of aquatic consumers need not be a major consideration in the SIA of food webs.
Resumo:
Realising high performance image and signal processing
applications on modern FPGA presents a challenging implementation problem due to the large data frames streaming through these systems. Specifically, to meet the high bandwidth and data storage demands of these applications, complex hierarchical memory architectures must be manually specified
at the Register Transfer Level (RTL). Automated approaches which convert high-level operation descriptions, for instance in the form of C programs, to an FPGA architecture, are unable to automatically realise such architectures. This paper
presents a solution to this problem. It presents a compiler to automatically derive such memory architectures from a C program. By transforming the input C program to a unique dataflow modelling dialect, known as Valved Dataflow (VDF), a mapping and synthesis approach developed for this dialect can
be exploited to automatically create high performance image and video processing architectures. Memory intensive C kernels for Motion Estimation (CIF Frames at 30 fps), Matrix Multiplication (128x128 @ 500 iter/sec) and Sobel Edge Detection (720p @ 30 fps), which are unrealisable by current state-of-the-art C-based synthesis tools, are automatically derived from a C description of the algorithm.
Resumo:
Large data sets of radiocarbon dates are becoming a more common feature of archaeological research. The sheer numbers of radiocarbon dates produced, however, raise issues of representation and interpretation. This paper presents a methodology which both reduces the visible impact of dating fluctuations, but also takes into consideration the influence of the underlying radiocarbon calibration curve. By doing so, it may be possible to distinguish between periods of human activity in early medieval Ireland and the statistical tails produced by radiocarbon calibration.
Resumo:
With security and surveillance, there is an increasing need to be able to process image data efficiently and effectively either at source or in a large data networks. Whilst Field Programmable Gate Arrays have been seen as a key technology for enabling this, they typically use high level and/or hardware description language synthesis approaches; this provides a major disadvantage in terms of the time needed to design or program them and to verify correct operation; it considerably reduces the programmability capability of any technique based on this technology. The work here proposes a different approach of using optimised soft-core processors which can be programmed in software. In particular, the paper proposes a design tool chain for programming such processors that uses the CAL Actor Language as a starting point for describing an image processing algorithm and targets its implementation to these custom designed, soft-core processors on FPGA. The main purpose is to exploit the task and data parallelism in order to achieve the same parallelism as a previous HDL implementation but avoiding the design time, verification and debugging steps associated with such approaches.
Resumo:
Since the earliest days of cystic fibrosis (CF) treatment, patient data have been recorded and reviewed in order to identify the factors that lead to more favourable outcomes. Large data repositories, such as the US Cystic Fibrosis Registry, which was established in the 1960s, enabled successful treatments and patient outcomes to be recognized and improvement programmes to be implemented in specialist CF centres. Over the past decades, the greater volumes of data becoming available through Centre databases and patient registries led to the possibility of making comparisons between different therapies, approaches to care and indeed data recording. The quality of care for individuals with CF has become a focus at several levels: patient, centre, regional, national and international. This paper reviews the quality management and improvement issues at each of these levels with particular reference to indicators of health, the role of CF Centres, regional networks, national health policy, and international data registration and comparisons.
Resumo:
We consider an application scenario where points of interest (PoIs) each have a web presence and where a web user wants to iden- tify a region that contains relevant PoIs that are relevant to a set of keywords, e.g., in preparation for deciding where to go to conve- niently explore the PoIs. Motivated by this, we propose the length- constrained maximum-sum region (LCMSR) query that returns a spatial-network region that is located within a general region of in- terest, that does not exceed a given size constraint, and that best matches query keywords. Such a query maximizes the total weight of the PoIs in it w.r.t. the query keywords. We show that it is NP- hard to answer this query. We develop an approximation algorithm with a (5 + ǫ) approximation ratio utilizing a technique that scales node weights into integers. We also propose a more efficient heuris- tic algorithm and a greedy algorithm. Empirical studies on real data offer detailed insight into the accuracy of the proposed algorithms and show that the proposed algorithms are capable of computingresults efficiently and effectively.
Resumo:
We present a method for learning Bayesian networks from data sets containing thousands of variables without the need for structure constraints. Our approach is made of two parts. The first is a novel algorithm that effectively explores the space of possible parent sets of a node. It guides the exploration towards the most promising parent sets on the basis of an approximated score function that is computed in constant time. The second part is an improvement of an existing ordering-based algorithm for structure optimization. The new algorithm provably achieves a higher score compared to its original formulation. Our novel approach consistently outperforms the state of the art on very large data sets.
Resumo:
Context. The ESA Rosetta spacecraft, currently orbiting around comet 67P/Churyumov-Gerasimenko, has already provided in situ measurements of the dust grain properties from several instruments,particularly OSIRIS and GIADA. We propose adding value to those measurements by combining them with ground-based observations of the dust tail to monitor the overall, time-dependent dust-production rate and size distribution.
Aims. To constrain the dust grain properties, we take Rosetta OSIRIS and GIADA results into account, and combine OSIRIS data during the approach phase (from late April to early June 2014) with a large data set of ground-based images that were acquired with the ESO Very Large Telescope (VLT) from February to November 2014.
Methods. A Monte Carlo dust tail code, which has already been used to characterise the dust environments of several comets and active asteroids, has been applied to retrieve the dust parameters. Key properties of the grains (density, velocity, and size distribution) were obtained from Rosetta observations: these parameters were used as input of the code to considerably reduce the number of free parameters. In this way, the overall dust mass-loss rate and its dependence on the heliocentric distance could be obtained accurately.
Results. The dust parameters derived from the inner coma measurements by OSIRIS and GIADA and from distant imaging using VLT data are consistent, except for the power index of the size-distribution function, which is α = −3, instead of α = −2, for grains smaller than 1 mm. This is possibly linked to the presence of fluffy aggregates in the coma. The onset of cometary activity occurs at approximately 4.3 AU, with a dust production rate of 0.5 kg/s, increasing up to 15 kg/s at 2.9 AU. This implies a dust-to-gas mass ratio varying between 3.8 and 6.5 for the best-fit model when combined with water-production rates from the MIRO experiment.
Resumo:
Accurate address information from health service providers is fundamental for the effective delivery of health care and population monitoring and screening. While it is currently used in the production of key statistics such as internal migration estimates, it will become even more important over time with the 2021 Census of UK constituent countries integrating administrative data to enhance the quality of statistical outputs. Therefore, it is beneficial to improve understanding of the accuracy of address information held by health service providers and factors that influence this. This paper builds upon previous research on the social geography of address mismatch between census and health service records in Northern Ireland. It is based on the Northern Ireland Longitudinal Study; this is a large data linkage study including about 28 per cent of the Northern Ireland population, which is matched between the census (2001, 2011) and Health Card Registration System maintained by the Health and Social Care Business Service Organisation (BSO). This research compares address information from the Spring 2011 BSO download (Unique Property Reference Number, Super Output Area) with comparable geographic information from the 2011 Census. Multivariate and multilevel analyses are used to assess the individual and ecological determinants of match/mismatch between geographical information in both data sources to determine if the characteristics of the associated people and places are the same as the position observed in 2001. It is important to understand if the same people are being inaccurately geographically referenced in both Census years or if the situation is more variable.
Resumo:
We present a large data set of high-cadence dMe flare light curves obtained with custom continuum filters on the triple-beam, high-speed camera system ULTRACAM. The measurements provide constraints for models of the near-ultraviolet (NUV) and optical continuum spectral evolution on timescales of ≈1 s. We provide a robust interpretation of the flare emission in the ULTRACAM filters using simultaneously obtained low-resolution spectra during two moderate-sized flares in the dM4.5e star YZ CMi. By avoiding the spectral complexity within the broadband Johnson filters, the ULTRACAM filters are shown to characterize bona fide continuum emission in the NUV, blue, and red wavelength regimes. The NUV/blue flux ratio in flares is equivalent to a Balmer jump ratio, and the blue/red flux ratio provides an estimate for the color temperature of the optical continuum emission. We present a new “color-color” relationship for these continuum flux ratios at the peaks of the flares. Using the RADYN and RH codes, we interpret the ULTRACAM filter emission using the dominant emission processes from a radiative-hydrodynamic flare model with a high nonthermal electron beam flux, which explains a hot, T ≈ 104 K, color temperature at blue-to-red optical wavelengths and a small Balmer jump ratio as observed in moderate-sized and large flares alike. We also discuss the high time resolution, high signal-to-noise continuum color variations observed in YZ CMi during a giant flare, which increased the NUV flux from this star by over a factor of 100. Based on observations obtained with the Apache Point Observatory 3.5 m telescope, which is owned and operated by the Astrophysical Research Consortium, based on observations made with the William Herschel Telescope operated on the island of La Palma by the Isaac Newton Group in the Spanish Observatorio del Roque de los Muchachos of the Instituto de Astrofsica de Canarias, and observations, and based on observations made with the ESO Telescopes at the La Silla Paranal Observatory under programme ID 085.D-0501(A).