944 resultados para Data snooping bias
Resumo:
This study was undertaken by UKOLN on behalf of the Joint Information Systems Committee (JISC) in the period April to September 2008. Application profiles are metadata schemata which consist of data elements drawn from one or more namespaces, optimized for a particular local application. They offer a way for particular communities to base the interoperability specifications they create and use for their digital material on established open standards. This offers the potential for digital materials to be accessed, used and curated effectively both within and beyond the communities in which they were created. The JISC recognized the need to undertake a scoping study to investigate metadata application profile requirements for scientific data in relation to digital repositories, and specifically concerning descriptive metadata to support resource discovery and other functions such as preservation. This followed on from the development of the Scholarly Works Application Profile (SWAP) undertaken within the JISC Digital Repositories Programme and led by Andy Powell (Eduserv Foundation) and Julie Allinson (RRT UKOLN) on behalf of the JISC. Aims and Objectives 1.To assess whether a single metadata AP for research data, or a small number thereof, would improve resource discovery or discovery-to-delivery in any useful or significant way. 2.If so, then to:a.assess whether the development of such AP(s) is practical and if so, how much effort it would take; b.scope a community uptake strategy that is likely to be successful, identifying the main barriers and key stakeholders. 3.Otherwise, to investigate how best to improve cross-discipline, cross-community discovery-to-delivery for research data, and make recommendations to the JISC and others as appropriate. Approach The Study used a broad conception of what constitutes scientific data, namely data gathered, collated, structured and analysed using a recognizably scientific method, with a bias towards quantitative methods. The approach taken was to map out the landscape of existing data centres, repositories and associated projects, and conduct a survey of the discovery-to-delivery metadata they use or have defined, alongside any insights they have gained from working with this metadata. This was followed up by a series of unstructured interviews, discussing use cases for a Scientific Data Application Profile, and how widely a single profile might be applied. On the latter point, matters of granularity, the experimental/measurement contrast, the quantitative/qualitative contrast, the raw/derived data contrast, and the homogeneous/heterogeneous data collection contrast were discussed. The Study report was loosely structured according to the Singapore Framework for Dublin Core Application Profiles, and in turn considered: the possible use cases for a Scientific Data Application Profile; existing domain models that could either be used or adapted for use within such a profile; and a comparison existing metadata profiles and standards to identify candidate elements for inclusion in the description set profile for scientific data. The report also considered how the application profile might be implemented, its relationship to other application profiles, the alternatives to constructing a Scientific Data Application Profile, the development effort required, and what could be done to encourage uptake in the community. The conclusions of the Study were validated through a reference group of stakeholders.
Resumo:
Neste trabalho estudamos as características das distribuições da lacuna de rapidez em amostras de eventos de minimum bias de colisões pp a ps=7 TeV no CMS/LHC. Tais eventos são constituídos por processos difrativos, além de processos de QCD mole. São investigados o tamanho e a localização das lacunas, assim como as correlações entre as distribuições obtidas a partir dos objetos reconstruídos no detector e as distribuições obtidas a partir das partículas geradas via simulação Monte Carlo. Uma boa compreensão dessas distribuições pode, eventualmente, possibilitar a caracterização de eventos difrativos nos dados.
Resumo:
English: Data obtained from tagging experiments initiated during 1953-1958 and 1969-1981 for skipjack tuna from the coastal eastern Pacific Ocean (EPO) are reanalyzed, using the Schnute generalized growth model. The objective is to provide information that can be used to generate a growth transition matrix for use in a length-structured population dynamics model. The analysis includes statistical approaches to include individual variability in growth as a function of length at release and time at liberty, measurement error, and transcription error. The tagging data are divided into northern and southern regions, and the results suggest that growth rates differ between the two regions. The Schnute model provides a significantly better fit to the data than the von Bertalanffy model, a sub-model of the Schnute model, for the northern region, but not for the southern region. Individual variation in growth is best described as a function of time at liberty and as a function of growth increment for the northern and southern regions, respectively. Measurement error is a significant part of the total variation, but the results suggest that there is no bias caused by the measurement error. Additional information, particularly for small and large fish, is needed to produce an adequate growth transition matrix that can be used in a length-structured population dynamics model for skipjack tuna in the EPO. Spanish: Los datos obtenidos de los experimentos de marcado iniciados durante los períodos de 1953- 1958 y de 1969-1981 para el atún barrilete en las costas del Océano Pacífico Oriental (OPO) fueron analizados nuevamente, utilizando el modelo de crecimiento generalizado de Schnute. El objetivo es brindar información que sea útil para producir una matriz sobre la tran-sición de crecimiento que pueda utilizarse en un modelo de dinámica poblacional estructurado por talla. El análisis usa enfoques estadísticos para poder incluir la variabilidad individual del crecimiento como función de la talla de liberación y tiempo en libertad, el error de medición, y el error de transcripción. Los datos de marcado son divididos en regiones norte y sur, y los resultados sugieren que las tasas de crecimiento en las dos regiones son diferentes. En la región norte, pero no en la región sur, el modelo de Schnute se ajusta significativamente mejor a los datos que el modelo von Bertalanffy, un sub-modelo del modelo de Schnute. La mejor descripción de la variación individual en el crecimiento es como una función del tiempo en libertad y como una función del incremento de crecimiento para las regiones norte y sur, respectivamente. El error de medición es una parte significativa de la variación total, pero los resultados sugieren que no existe un sesgo causado por el error de medición. Se necesita información adicional, particularmente para peces pequeños y grandes, para poder producir una matriz de transición de crecimiento adecuada que pueda utilizarse en el modelo de dinámica poblacional estructurado por tallas para el atún barrilete en el OPO.
Resumo:
Estimating the abundance of cetaceans from aerial survey data requires careful attention to survey design and analysis. Once an aerial observer perceives a marine mammal or group of marine mammals, he or she has only a few seconds to identify and enumerate the individuals sighted, as well as to determine the distance to the sighting and record this information. In line-transect survey analyses, it is assumed that the observer has correctly identified and enumerated the group or individual. We describe methods used to test this assumption and how survey data should be adjusted to account for observer errors. Harbor porpoises (Phocoena phocoena) were censused during aerial surveys in the summer of 1997 in Southeast Alaska (9844 km survey effort), in the summer of 1998 in the Gulf of Alaska (10,127 km), and in the summer of 1999 in the Bering Sea (7849 km). Sightings of harbor porpoise during a beluga whale (Phocoena phocoena) survey in 1998 (1355 km) provided data on harbor porpoise abundance in Cook Inlet for the Gulf of Alaska stock. Sightings by primary observers at side windows were compared to an independent observer at a belly window to estimate the probability of misidentification, underestimation of group size, and the probability that porpoise on the surface at the trackline were missed (perception bias, g(0)). There were 129, 96, and 201 sightings of harbor porpoises in the three stock areas, respectively. Both g(0) and effective strip width (the realized width of the survey track) depended on survey year, and g(0) also depended on the visibility reported by observers. Harbor porpoise abundance in 1997–99 was estimated at 11,146 animals for the Southeast Alaska stock, 31,046 animals for the Gulf of Alaska stock, and 48,515 animals for the Bering Sea stock.
Resumo:
Stable isotope (SI) values of carbon (δ13C) and nitrogen (δ15N) are useful for determining the trophic connectivity between species within an ecosystem, but interpretation of these data involves important assumptions about sources of intrapopulation variability. We compared intrapopulation variability in δ13C and δ15N for an estuarine omnivore, Spotted Seatrout (Cynoscion nebulosus), to test assumptions and assess the utility of SI analysis for delineation of the connectivity of this species with other species in estuarine food webs. Both δ13C and δ15N values showed patterns of enrichment in fish caught from coastal to offshore sites and as a function of fish size. Results for δ13C were consistent in liver and muscle tissue, but liver δ15N showed a negative bias when compared with muscle that increased with absolute δ15N value. Natural variability in both isotopes was 5–10 times higher than that observed in laboratory populations, indicating that environmentally driven intrapopulation variability is detectable particularly after individual bias is removed through sample pooling. These results corroborate the utility of SI analysis for examination of the position of Spotted Seatrout in an estuarine food web. On the basis of these results, we conclude that interpretation of SI data in fishes should account for measurable and ecologically relevant intrapopulation variability for each species and system on a case by case basis.
Resumo:
Demersal groundfish densities were estimated by conducting a visual strip-transect survey via manned submersible on the continental shelf off Cape Flattery, Washington. The purpose of this study was to evaluate the statistical sampling power of the submersible survey as a tool to discriminate density differences between trawlable and untrawlable habitats. A geophysical map of the study area was prepared with side-scan sonar imagery, multibeam bathymetry data, and known locations of historical NMFS trawl survey events. Submersible transects were completed at randomly selected dive sites located in each habitat type. Significant differences in density between habitats were observed for lingcod (Ophiodon elongatus), yelloweye rockfish (Sebastes ruberrimus), and tiger rockfish (S. nigrocinctus) individually, and for “all rockfish” and “all flatfish” in the aggregate. Flatfish were more than ten times as abundant in the trawlable habitat samples than in the untrawlable samples, whereas rockfish as a group were over three times as abundant in the untrawlable habitat samples. Guidelines for sample sizes and implications for the estimation of the continental shelf trawl-survey habitat-bias are considered. We demonstrate an approach that can be used to establish sample size guidelines for future work by illustrating the interplay between statistical sampling power and 1) habitat specific-density differences, 2) variance of density differences, and 3) the proportion of untrawlable area in a habitat.
Resumo:
Recent developments in modeling driver steering control with preview are reviewed. While some validation with experimental data has been presented, the rigorous application of formal system identification methods has not yet been attempted. This paper describes a steering controller based on linear model-predictive control. An indirect identification method that minimizes steering angle prediction error is developed. Special attention is given to filtering the prediction error so as to avoid identification bias that arises from the closed-loop operation of the driver-vehicle system. The identification procedure is applied to data collected from 14 test drivers performing double lane change maneuvers in an instrumented vehicle. It is found that the identification procedure successfully finds parameter values for the model that give small prediction errors. The procedure is also able to distinguish between the different steering strategies adopted by the test drivers. © 2006 IEEE.
Resumo:
Many visual datasets are traditionally used to analyze the performance of different learning techniques. The evaluation is usually done within each dataset, therefore it is questionable if such results are a reliable indicator of true generalization ability. We propose here an algorithm to exploit the existing data resources when learning on a new multiclass problem. Our main idea is to identify an image representation that decomposes orthogonally into two subspaces: a part specific to each dataset, and a part generic to, and therefore shared between, all the considered source sets. This allows us to use the generic representation as un-biased reference knowledge for a novel classification task. By casting the method in the multi-view setting, we also make it possible to use different features for different databases. We call the algorithm MUST, Multitask Unaligned Shared knowledge Transfer. Through extensive experiments on five public datasets, we show that MUST consistently improves the cross-datasets generalization performance. © 2013 Springer-Verlag.
Resumo:
Genetic variation at the serotonin transporter-linked polymorphic region (5-HTTLPR) is associated with altered amygdala reactivity and lack of prefrontal regulatory control. Similar regions mediate decision-making biases driven by contextual cues and ambiguity, for example the "framing effect." We hypothesized that individuals hemozygous for the short (s) allele at the 5-HTTLPR would be more susceptible to framing. Participants, selected as homozygous for either the long (la) or s allele, performed a decision-making task where they made choices between receiving an amount of money for certain and taking a gamble. A strong bias was evident toward choosing the certain option when the option was phrased in terms of gains and toward gambling when the decision was phrased in terms of losses (the frame effect). Critically, this bias was significantly greater in the ss group compared with the lala group. In simultaneously acquired functional magnetic resonance imaging data, the ss group showed greater amygdala during choices made in accord, compared with those made counter to the frame, an effect not seen in the lala group. These differences were also mirrored by differences in anterior cingulate-amygdala coupling between the genotype groups during decision making. Specifically, lala participants showed increased coupling during choices made counter to, relative to those made in accord with, the frame, with no such effect evident in ss participants. These data suggest that genetically mediated differences in prefrontal-amygdala interactions underpin interindividual differences in economic decision making.
Resumo:
An electro-optically (EO) modulated oxide-confined vertical-cavity surface-emitting laser (VCSEL) containing a saturable absorber in the VCSEL cavity is studied. The device contains an EO modulator section that is resonant with the VCSEL cavity. A type-II EO superlattice medium is employed in the modulator section and shown to result in a strong negative EO effect in weak electric fields. Applying the reverse bias voltages to the EO section allows triggering of short pulses in the device. Digital data transmission (return-to-zero pseudo-random bit sequence, 27-1) at 10Gb/s at bit-error-rates well below 10-9 is demonstrated. © 2014 AIP Publishing LLC.
Resumo:
It has been previously observed that thin film transistors (TFTs) utilizing an amorphous indium gallium zinc oxide (a-IGZO) semiconducting channel suffer from a threshold voltage shift when subjected to a negative gate bias and light illumination simultaneously. In this work, a thermalization energy analysis has been applied to previously published data on negative bias under illumination stress (NBIS) in a-IGZO TFTs. A barrier to defect conversion of 0.65-0.75 eV is extracted, which is consistent with reported energies of oxygen vacancy migration. The attempt-to-escape frequency is extracted to be 10 6-107 s-1, which suggests a weak localization of carriers in band tail states over a 20-40 nm distance. Models for the NBIS mechanism based on charge trapping are reviewed and a defect pool model is proposed in which two distinct distributions of defect states exist in the a-IGZO band gap: these are associated with states that are formed as neutrally charged and 2+ charged oxygen vacancies at the time of film formation. In this model, threshold voltage shift is not due to a defect creation process, but to a change in the energy distribution of states in the band gap upon defect migration as this allows a state formed as a neutrally charged vacancy to be converted into one formed as a 2+ charged vacancy and vice versa. Carrier localization close to the defect migration site is necessary for the conversion process to take place, and such defect migration sites are associated with conduction and valence band tail states. Under negative gate bias stressing, the conduction band tail is depleted of carriers, but the bias is insufficient to accumulate holes in the valence band tail states, and so no threshold voltage shift results. It is only under illumination that the quasi Fermi level for holes is sufficiently lowered to allow occupation of valence band tail states. The resulting charge localization then allows a negative threshold voltage shift, but only under conditions of simultaneous negative gate bias and illumination, as observed experimentally as the NBIS effect. © 2014 AIP Publishing LLC.
Resumo:
Test strip detectors of 125 mu m, 500 mu m, and 1 mm pitches with about 1 cm(2) areas have been made on medium-resistivity silicon wafers (1.3 and 2.7 k Ohm cm). Detectors of 500 mu m pitch have been tested for charge collection and position precision before and after neutron irradiation (up to 2 x 10(14) n/cm(2)) using 820 and 1030 nm laser lights with different beam-spot sizes. It has been found that for a bias of 250 V a strip detector made of 1.3 k Ohm cm (300 mu m thick) can be fully depleted before and after an irradiation of 2 x 10(14) n/cm(2). For a 500 mu m pitch strip detector made of 2.7 k Ohm cm tested with an 1030 nm laser light with 200 mu m spot size, the position reconstruction error is about 14 mu m before irradiation, and 17 mu m after about 1.7 x 10(13) n/cm(2) irradiation. We demonstrated in this work that medium resistivity silicon strip detectors can work just as well as the traditional high-resistivity ones, but with higher radiation tolerance. We also tested charge sharing and position reconstruction using a 1030 nm wavelength (300 mu m absorption length in Si at RT) laser, which provides a simulation of MIP particles in high-physics experiments in terms of charge collection and position reconstruction, (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
Test strip detectors of 125 mu m, 500 mu m, and 1 mm pitches with about 1 cm(2) areas have been made on medium-resistivity silicon wafers (1.3 and 2.7 k Ohm cm). Detectors of 500 mu m pitch have been tested for charge collection and position precision before and after neutron irradiation (up to 2 x 10(14) n/cm(2)) using 820 and 1030 nm laser lights with different beam-spot sizes. It has been found that for a bias of 250 V a strip detector made of 1.3 k Ohm cm (300 mu m thick) can be fully depleted before and after an irradiation of 2 x 10(14) n/cm(2). For a 500 mu m pitch strip detector made of 2.7 k Ohm cm tested with an 1030 nm laser light with 200 mu m spot size, the position reconstruction error is about 14 mu m before irradiation, and 17 mu m after about 1.7 x 10(13) n/cm(2) irradiation. We demonstrated in this work that medium resistivity silicon strip detectors can work just as well as the traditional high-resistivity ones, but with higher radiation tolerance. We also tested charge sharing and position reconstruction using a 1030 nm wavelength (300 mu m absorption length in Si at RT) laser, which provides a simulation of MIP particles in high-physics experiments in terms of charge collection and position reconstruction, (C) 1999 Elsevier Science B.V. All rights reserved.
Resumo:
It is estimated that the quantity of digital data being transferred, processed or stored at any one time currently stands at 4.4 zettabytes (4.4 × 2 70 bytes) and this figure is expected to have grown by a factor of 10 to 44 zettabytes by 2020. Exploiting this data is, and will remain, a significant challenge. At present there is the capacity to store 33% of digital data in existence at any one time; by 2020 this capacity is expected to fall to 15%. These statistics suggest that, in the era of Big Data, the identification of important, exploitable data will need to be done in a timely manner. Systems for the monitoring and analysis of data, e.g. stock markets, smart grids and sensor networks, can be made up of massive numbers of individual components. These components can be geographically distributed yet may interact with one another via continuous data streams, which in turn may affect the state of the sender or receiver. This introduces a dynamic causality, which further complicates the overall system by introducing a temporal constraint that is difficult to accommodate. Practical approaches to realising the system described above have led to a multiplicity of analysis techniques, each of which concentrates on specific characteristics of the system being analysed and treats these characteristics as the dominant component affecting the results being sought. The multiplicity of analysis techniques introduces another layer of heterogeneity, that is heterogeneity of approach, partitioning the field to the extent that results from one domain are difficult to exploit in another. The question is asked can a generic solution for the monitoring and analysis of data that: accommodates temporal constraints; bridges the gap between expert knowledge and raw data; and enables data to be effectively interpreted and exploited in a transparent manner, be identified? The approach proposed in this dissertation acquires, analyses and processes data in a manner that is free of the constraints of any particular analysis technique, while at the same time facilitating these techniques where appropriate. Constraints are applied by defining a workflow based on the production, interpretation and consumption of data. This supports the application of different analysis techniques on the same raw data without the danger of incorporating hidden bias that may exist. To illustrate and to realise this approach a software platform has been created that allows for the transparent analysis of data, combining analysis techniques with a maintainable record of provenance so that independent third party analysis can be applied to verify any derived conclusions. In order to demonstrate these concepts, a complex real world example involving the near real-time capturing and analysis of neurophysiological data from a neonatal intensive care unit (NICU) was chosen. A system was engineered to gather raw data, analyse that data using different analysis techniques, uncover information, incorporate that information into the system and curate the evolution of the discovered knowledge. The application domain was chosen for three reasons: firstly because it is complex and no comprehensive solution exists; secondly, it requires tight interaction with domain experts, thus requiring the handling of subjective knowledge and inference; and thirdly, given the dearth of neurophysiologists, there is a real world need to provide a solution for this domain
Resumo:
Hydrologic research is a very demanding application of fiber-optic distributed temperature sensing (DTS) in terms of precision, accuracy and calibration. The physics behind the most frequently used DTS instruments are considered as they apply to four calibration methods for single-ended DTS installations. The new methods presented are more accurate than the instrument-calibrated data, achieving accuracies on the order of tenths of a degree root mean square error (RMSE) and mean bias. Effects of localized non-uniformities that violate the assumptions of single-ended calibration data are explored and quantified. Experimental design considerations such as selection of integration times or selection of the length of the reference sections are discussed, and the impacts of these considerations on calibrated temperatures are explored in two case studies.