944 resultados para Data processing methods
Resumo:
AMS Subj. Classification: 62P10, 62H30, 68T01
Resumo:
As massive data sets become increasingly available, people are facing the problem of how to effectively process and understand these data. Traditional sequential computing models are giving way to parallel and distributed computing models, such as MapReduce, both due to the large size of the data sets and their high dimensionality. This dissertation, as in the same direction of other researches that are based on MapReduce, tries to develop effective techniques and applications using MapReduce that can help people solve large-scale problems. Three different problems are tackled in the dissertation. The first one deals with processing terabytes of raster data in a spatial data management system. Aerial imagery files are broken into tiles to enable data parallel computation. The second and third problems deal with dimension reduction techniques that can be used to handle data sets of high dimensionality. Three variants of the nonnegative matrix factorization technique are scaled up to factorize matrices of dimensions in the order of millions in MapReduce based on different matrix multiplication implementations. Two algorithms, which compute CANDECOMP/PARAFAC and Tucker tensor decompositions respectively, are parallelized in MapReduce based on carefully partitioning the data and arranging the computation to maximize data locality and parallelism.
Resumo:
The generation of heterogeneous big data sources with ever increasing volumes, velocities and veracities over the he last few years has inspired the data science and research community to address the challenge of extracting knowledge form big data. Such a wealth of generated data across the board can be intelligently exploited to advance our knowledge about our environment, public health, critical infrastructure and security. In recent years we have developed generic approaches to process such big data at multiple levels for advancing decision-support. It specifically concerns data processing with semantic harmonisation, low level fusion, analytics, knowledge modelling with high level fusion and reasoning. Such approaches will be introduced and presented in context of the TRIDEC project results on critical oil and gas industry drilling operations and also the ongoing large eVacuate project on critical crowd behaviour detection in confined spaces.
Resumo:
The Data Processing Department of ISHC has developed coding forms to be used for the data to be entered into the program. The Highway Planning and Programming and the Design Departments are responsible for coding and submitting the necessary data forms to Data Processing for the noise prediction on the highway sections.
Resumo:
Background: The impact of cancer upon children, teenagers and young people can be profound. Research has been undertaken to explore the impacts upon children, teenagers and young people with cancer, but little is known about how researchers can ‘best’ engage with this group to explore their experiences. This review paper provides an overview of the utility of data collection methods employed when undertaking research with children, teenagers and young people. A systematic review of relevant databases was undertaken utilising the search terms ‘young people’, ‘young adult’, ‘adolescent’ and ‘data collection methods’. The full-text of the papers that were deemed eligible from the title and abstract were accessed and following discussion within the research team, thirty papers were included. Findings: Due to the heterogeneity in terms of the scope of the papers identified the following data collections methods were included in the results section. Three of the papers identified provided an overview of data collection methods utilised with this population and the remaining twenty seven papers covered the following data collection methods: Digital technologies; art based research; comparing the use of ‘paper and pencil’ research with web-based technologies, the use of games; the use of a specific communication tool; questionnaires and interviews; focus groups and telephone interviews/questionnaires. The strengths and limitations of the range of data collection methods included are discussed drawing upon such issues as of the appropriateness of particular methods for particular age groups, or the most appropriate method to employ when exploring a particularly sensitive topic area. Conclusions: There are a number of data collection methods utilised to undertaken research with children, teenagers and young adults. This review provides a summary of the current available evidence and an overview of the strengths and limitations of data collection methods employed.
Resumo:
By providing vehicle-to-vehicle and vehicle-to-infrastructure wireless communications, vehicular ad hoc networks (VANETs), also known as the “networks on wheels”, can greatly enhance traffic safety, traffic efficiency and driving experience for intelligent transportation system (ITS). However, the unique features of VANETs, such as high mobility and uneven distribution of vehicular nodes, impose critical challenges of high efficiency and reliability for the implementation of VANETs. This dissertation is motivated by the great application potentials of VANETs in the design of efficient in-network data processing and dissemination. Considering the significance of message aggregation, data dissemination and data collection, this dissertation research targets at enhancing the traffic safety and traffic efficiency, as well as developing novel commercial applications, based on VANETs, following four aspects: 1) accurate and efficient message aggregation to detect on-road safety relevant events, 2) reliable data dissemination to reliably notify remote vehicles, 3) efficient and reliable spatial data collection from vehicular sensors, and 4) novel promising applications to exploit the commercial potentials of VANETs. Specifically, to enable cooperative detection of safety relevant events on the roads, the structure-less message aggregation (SLMA) scheme is proposed to improve communication efficiency and message accuracy. The scheme of relative position based message dissemination (RPB-MD) is proposed to reliably and efficiently disseminate messages to all intended vehicles in the zone-of-relevance in varying traffic density. Due to numerous vehicular sensor data available based on VANETs, the scheme of compressive sampling based data collection (CS-DC) is proposed to efficiently collect the spatial relevance data in a large scale, especially in the dense traffic. In addition, with novel and efficient solutions proposed for the application specific issues of data dissemination and data collection, several appealing value-added applications for VANETs are developed to exploit the commercial potentials of VANETs, namely general purpose automatic survey (GPAS), VANET-based ambient ad dissemination (VAAD) and VANET based vehicle performance monitoring and analysis (VehicleView). Thus, by improving the efficiency and reliability in in-network data processing and dissemination, including message aggregation, data dissemination and data collection, together with the development of novel promising applications, this dissertation will help push VANETs further to the stage of massive deployment.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Optical monitoring systems are necessary to manufacture multilayer thin-film optical filters with low tolerance on spectrum specification. Furthermore, to have better accuracy on the measurement of film thickness, direct monitoring is a must. Direct monitoring implies acquiring spectrum data from the optical component undergoing the film deposition itself, in real time. In making film depositions on surfaces of optical components, the high vacuum evaporator chamber is the most popular equipment. Inside the evaporator, at the top of the chamber, there is a metallic support with several holes where the optical components are assembled. This metallic support has rotary motion to promote film homogenization. To acquire a measurement of the spectrum of the film in deposition, it is necessary to pass a light beam through a glass witness undergoing the film deposition process, and collect a sample of the light beam using a spectrometer. As both the light beam and the light collector are stationary, a synchronization system is required to identify the moment at which the optical component passes through the light beam.
Resumo:
We describe a novel approach to explore DNA nucleotide sequence data, aiming to produce high-level categorical and structural information about the underlying chromosomes, genomes and species. The article starts by analyzing chromosomal data through histograms using fixed length DNA sequences. After creating the DNA-related histograms, a correlation between pairs of histograms is computed, producing a global correlation matrix. These data are then used as input to several data processing methods for information extraction and tabular/graphical output generation. A set of 18 species is processed and the extensive results reveal that the proposed method is able to generate significant and diversified outputs, in good accordance with current scientific knowledge in domains such as genomics and phylogenetics.
Resumo:
Land plants have had the reputation of being problematic for DNA barcoding for two general reasons: (i) the standard DNA regions used in algae, animals and fungi have exceedingly low levels of variability and (ii) the typically used land plant plastid phylogenetic markers (e.g. rbcL, trnL-F, etc.) appear to have too little variation. However, no one has assessed how well current phylogenetic resources might work in the context of identification (versus phylogeny reconstruction). In this paper, we make such an assessment, particularly with two of the markers commonly sequenced in land plant phylogenetic studies, plastid rbcL and internal transcribed spacers of the large subunits of nuclear ribosomal DNA (ITS), and find that both of these DNA regions perform well even though the data currently available in GenBank/EBI were not produced to be used as barcodes and BLAST searches are not an ideal tool for this purpose. These results bode well for the use of even more variable regions of plastid DNA (such as, for example, psbA-trnH) as barcodes, once they have been widely sequenced. In the short term, efforts to bring land plant barcoding up to the standards being used now in other organisms should make swift progress. There are two categories of DNA barcode users, scientists in fields other than taxonomy and taxonomists. For the former, the use of mitochondrial and plastid DNA, the two most easily assessed genomes, is at least in the short term a useful tool that permits them to get on with their studies, which depend on knowing roughly which species or species groups they are dealing with, but these same DNA regions have important drawbacks for use in taxonomic studies (i.e. studies designed to elucidate species limits). For these purposes, DNA markers from uniparentally (usually maternally) inherited genomes can only provide half of the story required to improve taxonomic standards being used in DNA barcoding. In the long term, we will need to develop more sophisticated barcoding tools, which would be multiple, low-copy nuclear markers with sufficient genetic variability and PCR-reliability; these would permit the detection of hybrids and permit researchers to identify the 'genetic gaps' that are useful in assessing species limits.
Resumo:
The CORNISH project is the highest resolution radio continuum survey of the Galactic plane to date. It is the 5 GHz radio continuum part of a series of multi-wavelength surveys that focus on the northern GLIMPSE region (10° < l < 65°), observed by the Spitzer satellite in the mid-infrared. Observations with the Very Large Array in B and BnA configurations have yielded a 1.''5 resolution Stokes I map with a root mean square noise level better than 0.4 mJy beam 1. Here we describe the data-processing methods and data characteristics, and present a new, uniform catalog of compact radio emission. This includes an implementation of automatic deconvolution that provides much more reliable imaging than standard CLEANing. A rigorous investigation of the noise characteristics and reliability of source detection has been carried out. We show that the survey is optimized to detect emission on size scales up to 14'' and for unresolved sources the catalog is more than 90% complete at a flux density of 3.9 mJy. We have detected 3062 sources above a 7σ detection limit and present their ensemble properties. The catalog is highly reliable away from regions containing poorly sampled extended emission, which comprise less than 2% of the survey area. Imaging problems have been mitigated by down-weighting the shortest spacings and potential artifacts flagged via a rigorous manual inspection with reference to the Spitzer infrared data. We present images of the most common source types found: H II regions, planetary nebulae, and radio galaxies. The CORNISH data and catalog are available online at http://cornish.leeds.ac.uk.
Resumo:
In this thesis mainly long quasi-periodic solar oscillations in various solar atmospheric structures are discussed, based on data obtained at several wavelengths, focussing, however, mainly on radio frequencies. Sunspot (Articles II and III) and quiet Sun area (QSA) (Article I) oscillations are investigated along with quasi-periodic pulsations (QPP) in a flaring event with wide-range radio spectra (Article IV). Various oscillation periods are detected; 3–15, 35–70 and 90 minutes (QSA), 10-60 and 80-130 minutes (in sunspots at various radio frequencies), 3-5, 10-23, 220-240, 340 and 470 minutes (in sunspots at photosphere) and 8-12 and 15-17 seconds (in a solar flare at radio frequencies). Some of the oscillation periods are detected for the first time, while some of them have been confirmed earlier by other research groups. Solar oscillations can provide more information on the nature of various solar structures. This thesis presents the physical mechanisms of some solar structure oscillations. Two different theoretical approaches are chosen; magnetohydrodynamics (MHD) and the shallow sunspot model. These two theories can explain a wide range of solar oscillations from a few seconds up to some hours. Various wave modes in loop structures cause solar oscillations (<45 minutes) both in sunspots and quiet Sun areas. Periods lasting more than 45 minutes in the sunspots (and a fraction of the shorter periods) are related to sunspot oscillations as a whole. Sometimes similar oscillation periods are detected both in sunspot area variations and respectively in magnetic field strength changes. This result supports a concept that these oscillations are related to sunspot oscillations as a whole. In addition, a theory behind QPPs at radio frequencies in solar flares is presented. The thesis also covers solar instrumentation and data sources. Additionally, the data processing methods are presented. As the majority of the investigations in this thesis focus on radio frequencies, also the most typical radio emission mechanisms are presented. The main structures of the Sun, which are related to solar oscillations, are also presented. Two separate projects are included in this thesis. Solar cyclicity is studied using the extensively large solar radio map archieve from Metsähovi Radio Observatory (MRO) at 37 GHz, between 1978 and 2011 (Article V) covering two full solar cycles. Also, some new solar instrumentation (Article VI) was developed during this thesis.
Resumo:
Ion mobility spectrometry (IMS) is a straightforward, low cost method for fast and sensitive determination of organic and inorganic analytes. Originally this portable technique was applied to the determination of gas phase compounds in security and military use. Nowadays, IMS has received increasing attention in environmental and biological analysis, and in food quality determination. This thesis consists of literature review of suitable sample preparation and introduction methods for liquid matrices applicable to IMS from its early development stages to date. Thermal desorption, solid phase microextraction (SPME) and membrane extraction were examined in experimental investigations of hazardous aquatic pollutants and potential pollutants. Also the effect of different natural waters on the extraction efficiency was studied, and the utilised IMS data processing methods are discussed. Parameters such as extraction and desorption temperatures, extraction time, SPME fibre depth, SPME fibre type and salt addition were examined for the studied sample preparation and introduction methods. The observed critical parameters were extracting material and temperature. The extraction methods showed time and cost effectiveness because sampling could be performed in single step procedures and from different natural water matrices within a few minutes. Based on these experimental and theoretical studies, the most suitable method to test in the automated monitoring system is membrane extraction. In future an IMS based early warning system for monitoring water pollutants could ensure the safe supply of drinking water. IMS can also be utilised for monitoring natural waters in cases of environmental leakage or chemical accidents. When combined with sophisticated sample introduction methods, IMS possesses the potential for both on-line and on-site identification of analytes in different water matrices.
Resumo:
OBJECTIVES: To develop a method for objective assessment of fine motor timing variability in Parkinson’s disease (PD) patients, using digital spiral data gathered by a touch screen device. BACKGROUND: A retrospective analysis was conducted on data from 105 subjects including65 patients with advanced PD (group A), 15 intermediate patients experiencing motor fluctuations (group I), 15 early stage patients (group S), and 10 healthy elderly subjects (HE) were examined. The subjects were asked to perform repeated upper limb motor tasks by tracing a pre-drawn Archimedes spiral as shown on the screen of the device. The spiral tracing test was performed using an ergonomic pen stylus, using dominant hand. The test was repeated three times per test occasion and the subjects were instructed to complete it within 10 seconds. Digital spiral data including stylus position (x-ycoordinates) and timestamps (milliseconds) were collected and used in subsequent analysis. The total number of observations with the test battery were as follows: Swedish group (n=10079), Italian I group (n=822), Italian S group (n = 811), and HE (n=299). METHODS: The raw spiral data were processed with three data processing methods. To quantify motor timing variability during spiral drawing tasks Approximate Entropy (APEN) method was applied on digitized spiral data. APEN is designed to capture the amount of irregularity or complexity in time series. APEN requires determination of two parameters, namely, the window size and similarity measure. In our work and after experimentation, window size was set to 4 and similarity measure to 0.2 (20% of the standard deviation of the time series). The final score obtained by APEN was normalized by total drawing completion time and used in subsequent analysis. The score generated by this method is hence on denoted APEN. In addition, two more methods were applied on digital spiral data and their scores were used in subsequent analysis. The first method was based on Digital Wavelet Transform and Principal Component Analysis and generated a score representing spiral drawing impairment. The score generated by this method is hence on denoted WAV. The second method was based on standard deviation of frequency filtered drawing velocity. The score generated by this method is hence on denoted SDDV. Linear mixed-effects (LME) models were used to evaluate mean differences of the spiral scores of the three methods across the four subject groups. Test-retest reliability of the three scores was assessed after taking mean of the three possible correlations (Spearman’s rank coefficients) between the three test trials. Internal consistency of the methods was assessed by calculating correlations between their scores. RESULTS: When comparing mean spiral scores between the four subject groups, the APEN scores were different between HE subjects and three patient groups (P=0.626 for S group with 9.9% mean value difference, P=0.089 for I group with 30.2%, and P=0.0019 for A group with 44.1%). However, there were no significant differences in mean scores of the other two methods, except for the WAV between the HE and A groups (P<0.001). WAV and SDDV were highly and significantly correlated to each other with a coefficient of 0.69. However, APEN was not correlated to neither WAV nor SDDV with coefficients of 0.11 and 0.12, respectively. Test-retest reliability coefficients of the three scores were as follows: APEN (0.9), WAV(0.83) and SD-DV (0.55). CONCLUSIONS: The results show that the digital spiral analysis-based objective APEN measure is able to significantly differentiate the healthy subjects from patients at advanced level. In contrast to the other two methods (WAV and SDDV) that are designed to quantify dyskinesias (over-medications), this method can be useful for characterizing Off symptoms in PD. The APEN was not correlated to none of the other two methods indicating that it measures a different construct of upper limb motor function in PD patients than WAV and SDDV. The APEN also had a better test-retest reliability indicating that it is more stable and consistent over time than WAV and SDDV.
Resumo:
The Compact Muon Solenoid (CMS) detector is described. The detector operates at the Large Hadron Collider (LHC) at CERN. It was conceived to study proton-proton (and lead-lead) collisions at a centre-of-mass energy of 14 TeV (5.5 TeV nucleon-nucleon) and at luminosities up to 10(34)cm(-2)s(-1) (10(27)cm(-2)s(-1)). At the core of the CMS detector sits a high-magnetic-field and large-bore superconducting solenoid surrounding an all-silicon pixel and strip tracker, a lead-tungstate scintillating-crystals electromagnetic calorimeter, and a brass-scintillator sampling hadron calorimeter. The iron yoke of the flux-return is instrumented with four stations of muon detectors covering most of the 4 pi solid angle. Forward sampling calorimeters extend the pseudo-rapidity coverage to high values (vertical bar eta vertical bar <= 5) assuring very good hermeticity. The overall dimensions of the CMS detector are a length of 21.6 m, a diameter of 14.6 m and a total weight of 12500 t.