919 resultados para physics.data-an
Resumo:
The DO experiment at Fermilab's Tevatron will record several petabytes of data over the next five years in pursuing the goals of understanding nature and searching for the origin of mass. Computing resources required to analyze these data far exceed capabilities of any one institution. Moreover, the widely scattered geographical distribution of DO collaborators poses further serious difficulties for optimal use of human and computing resources. These difficulties will exacerbate in future high energy physics experiments, like the LHC. The computing grid has long been recognized as a solution to these problems. This technology is being made a more immediate reality to end users in DO by developing a grid in the DO Southern Analysis Region (DOSAR), DOSAR-Grid, using a available resources within it and a home-grown local task manager, McFarm. We will present the architecture in which the DOSAR-Grid is implemented, the use of technology and the functionality of the grid, and the experience from operating the grid in simulation, reprocessing and data analyses for a currently running HEP experiment.
Resumo:
The tests that are currently available for the measurement of overexpression of the human epidermal growth factor-2 (HER2) in breast cancer have shown considerable problems in accuracy and interlaboratory reproducibility. Although these problems are partly alleviated by the use of validated, standardised 'kits', there may be considerable cost involved in their use. Prior to testing it may therefore be an advantage to be able to predict from basic pathology data whether a cancer is likely to overexpress HER2. In this study, we have correlated pathology features of cancers with the frequency of HER2 overexpression assessed by immunohistochemistry (IHC) using HercepTest (Dako). In addition, fluorescence in situ hybridisation (FISH) has been used to re-test the equivocal cancers and interobserver variation in assessing HER2 overexpression has been examined by a slide circulation scheme. Of the 1536 cancers, 1144 (74.5%) did not overexpress HER2. Unequivocal overexpression (3+ by IHC) was seen in 186 cancers (12%) and an equivocal result (2+ by IHC) was seen in 206 cancers (13%). Of the 156 IHC 3+ cancers for which complete data was available, 149 (95.5%) were ductal NST and 152 (97%) were histological grade 2 or 3. Only 1 of 124 infiltrating lobular carcinomas (0.8%) showed HER2 overexpression. None of the 49 'special types' of carcinoma showed HER2 overexpression. Re-testing by FISH of a proportion of the IHC 2+ cancers showed that only 25 (23%) of those assessable exhibited HER2 gene amplification, but 46 of the 47 IHC 3+ cancers (98%) were confirmed as showing gene amplification. Circulating slides for the assessment of HER2 score showed a moderate level of agreement between pathologists (kappa 0.4). As a result of this study we would advocate consideration of a triage approach to HER-2 testing. Infiltrating lobular and special types of carcinoma may not need to be routinely tested at presentation nor may grade 1 NST carcinomas in which only 1.4% have been shown to overexpress HER2. Testing of these carcinomas may be performed when HER2 status is required to assist in therapeutic or other clinical/prognostic decision-making. The highest yield of HER2 overexpressing carcinomas is seen in the grade 3 NST subgroup in which 24% are positive by IHC. (C) 2003 Elsevier Science Ltd. All rights reserved.
Resumo:
For zygosity diagnosis in the absence of genotypic data, or in the recruitment phase of a twin study where only single twins from same-sex pairs are being screened, or to provide a test for sample duplication leading to the false identification of a dizygotic pair as monozygotic, the appropriate analysis of respondents' answers to questions about zygosity is critical. Using data from a young adult Australian twin cohort (N = 2094 complete pairs and 519 singleton twins from same-sex pairs with complete responses to all zygosity items), we show that application of latent class analysis (LCA), fitting a 2-class model, yields results that show good concordance with traditional methods of zygosity diagnosis, but with certain important advantages. These include the ability, in many cases, to assign zygosity with specified probability on the basis of responses of a single informant (advantageous when one zygosity type is being oversampled); and the ability to quantify the probability of misassignment of zygosity, allowing prioritization of cases for genotyping as well as identification of cases of probable laboratory error. Out of 242 twins (from 121 like-sex pairs) where genotypic data were available for zygosity confirmation, only a single case was identified of incorrect zygosity assignment by the latent class algorithm. Zygosity assignment for that single case was identified by the LCA as uncertain (probability of being a monozygotic twin only 76%), and the co-twin's responses clearly identified the pair as dizygotic (probability of being dizygotic 100%). In the absence of genotypic data, or as a safeguard against sample duplication, application of LCA for zygosity assignment or confirmation is strongly recommended.
Resumo:
This paper provides empirical evidence that continuous time models with one factor of volatility, in some conditions, are able to fit the main characteristics of financial data. It also reports the importance of the feedback factor in capturing the strong volatility clustering of data, caused by a possible change in the pattern of volatility in the last part of the sample. We use the Efficient Method of Moments (EMM) by Gallant and Tauchen (1996) to estimate logarithmic models with one and two stochastic volatility factors (with and without feedback) and to select among them.
Resumo:
This paper develops a methodology to estimate the entire population distributions from bin-aggregated sample data. We do this through the estimation of the parameters of mixtures of distributions that allow for maximal parametric flexibility. The statistical approach we develop enables comparisons of the full distributions of height data from potential army conscripts across France's 88 departments for most of the nineteenth century. These comparisons are made by testing for differences-of-means stochastic dominance. Corrections for possible measurement errors are also devised by taking advantage of the richness of the data sets. Our methodology is of interest to researchers working on historical as well as contemporary bin-aggregated or histogram-type data, something that is still widely done since much of the information that is publicly available is in that form, often due to restrictions due to political sensitivity and/or confidentiality concerns.
Resumo:
Background. The use of hospital discharge administrative data (HDAD) has been recommended for automating, improving, even substituting, population-based cancer registries. The frequency of false positive and false negative cases recommends local validation. Methods. The aim of this study was to detect newly diagnosed, false positive and false negative cases of cancer from hospital discharge claims, using four Spanish population-based cancer registries as the gold standard. Prostate cancer was used as a case study. Results. A total of 2286 incident cases of prostate cancer registered in 2000 were used for validation. In the most sensitive algorithm (that using five diagnostic codes), estimates for Sensitivity ranged from 14.5% (CI95% 10.3-19.6) to 45.7% (CI95% 41.4-50.1). In the most predictive algorithm (that using five diagnostic and five surgical codes) Positive Predictive Value estimates ranged from 55.9% (CI95% 42.4-68.8) to 74.3% (CI95% 67.0-80.6). The most frequent reason for false positive cases was the number of prevalent cases inadequately considered as newly diagnosed cancers, ranging from 61.1% to 82.3% of false positive cases. The most frequent reason for false negative cases was related to the number of cases not attended in hospital settings. In this case, figures ranged from 34.4% to 69.7% of false negative cases, in the most predictive algorithm. Conclusions. HDAD might be a helpful tool for cancer registries to reach their goals. The findings suggest that, for automating cancer registries, algorithms combining diagnoses and procedures are the best option. However, for cancer surveillance purposes, in those cancers like prostate cancer in which care is not only hospital-based, combining inpatient and outpatient information will be required.
Resumo:
Foreign trade statistics are the main data source to the study of international trade.However its accuracy has been under suspicion since Morgernstern published hisfamous work in 1963. Federico and Tena (1991) have resumed the question arguing thatthey can be useful in an adequate level of aggregation. But the geographical assignmentproblem remains unsolved. This article focuses on the spatial variable through theanalysis of the reliability of textile international data for 1913. A geographical biasarises between export and import series, but because of its quantitative importance it canbe negligible in an international scale.
Resumo:
A general formalism is set up to analyze the response of an arbitrary solid elastic body to an arbitrary metric gravitational wave (GW) perturbation, which fully displays the details of the interaction antenna wave. The formalism is applied to the spherical detector, whose sensitivity parameters are thereby scrutinized. A multimode transfer function is defined to study the amplitude sensitivity, and absorption cross sections are calculated for a general metric theory of GW physics. Their scaling properties are shown to be independent of the underlying theory, with interesting consequences for future detector design. The GW incidence direction deconvolution problem is also discussed, always within the context of a general metric theory of the gravitational field.
Resumo:
In French the adjective petit 'small, little' has a special status: it fulfills various pragmatic functions in addition to semantic meanings and it is thus highly frequent in discourse. Résumé: This study, based on the data of two children, aged 1;6 to 2;11, argues that petit and its pragmatic meanings play a specific role in the acquisition of French adjectives. In contrast to what is expected in child language, petit favours the early development of a pattern of noun phrase with prenominal attributive adjective. The emergence and distribution of petit in the children's production is examined and related to its distribution in the input, and the detailed pragmatic meanings and functions of petit are analysed. Prenominal petit emerges early as the preferred and most productive adjective. Pragmatic meanings of petit appear to be predominant in this early age and are of two main types: expressions of endearment (in noun phrases) and mitigating devices whose scope is the entire utterance. These results, as well as instances of children's pragmatic overgeneralizations, provide new evidence that at least some pragmatic meanings are prior to semantic meanings in early acquisition.
Resumo:
Aerosols from anthropogenic and natural sources have been recognized as having an important impact on the climate system. However, the small size of aerosol particles (ranging from 0.01 to more than 10 μm in diameter) and their influence on solar and terrestrial radiation makes them difficult to represent within the coarse resolution of general circulation models (GCMs) such that small-scale processes, for example, sulfate formation and conversion, need parameterizing. It is the parameterization of emissions, conversion, and deposition and the radiative effects of aerosol particles that causes uncertainty in their representation within GCMs. The aim of this study was to perturb aspects of a sulfur cycle scheme used within a GCM to represent the climatological impacts of sulfate aerosol derived from natural and anthropogenic sulfur sources. It was found that perturbing volcanic SO2 emissions and the scavenging rate of SO2 by precipitation had the largest influence on the sulfate burden. When these parameters were perturbed the sulfate burden ranged from 0.73 to 1.17 TgS for 2050 sulfur emissions (A2 Special Report on Emissions Scenarios (SRES)), comparable with the range in sulfate burden across all the Intergovernmental Panel on Climate Change SRESs. Thus, the results here suggest that the range in sulfate burden due to model uncertainty is comparable with scenario uncertainty. Despite the large range in sulfate burden there was little influence on the climate sensitivity, which had a range of less than 0.5 K across the ensemble. We hypothesize that this small effect was partly associated with high sulfate loadings in the control phase of the experiment.
Resumo:
Multiple linear regression is used to diagnose the signal of the 11-yr solar cycle in zonal-mean zonal wind and temperature in the 40-yr ECMWF Re-Analysis (ERA-40) dataset. The results of previous studies are extended to 2008 using data from ECMWF operational analyses. This analysis confirms that the solar signal found in previous studies is distinct from that of volcanic aerosol forcing resulting from the eruptions of El Chichón and Mount Pinatubo, but it highlights the potential for confusion of the solar signal and lower-stratospheric temperature trends. A correction to an error that is present in previous results of Crooks and Gray, stemming from the use of a single daily analysis field rather than monthly averaged data, is also presented.
Resumo:
We propose a geoadditive negative binomial model (Geo-NB-GAM) for regional count data that allows us to address simultaneously some important methodological issues, such as spatial clustering, nonlinearities, and overdispersion. This model is applied to the study of location determinants of inward greenfield investments that occurred during 2003–2007 in 249 European regions. After presenting the data set and showing the presence of overdispersion and spatial clustering, we review the theoretical framework that motivates the choice of the location determinants included in the empirical model, and we highlight some reasons why the relationship between some of the covariates and the dependent variable might be nonlinear. The subsequent section first describes the solutions proposed by previous literature to tackle spatial clustering, nonlinearities, and overdispersion, and then presents the Geo-NB-GAM. The empirical analysis shows the good performance of Geo-NB-GAM. Notably, the inclusion of a geoadditive component (a smooth spatial trend surface) permits us to control for spatial unobserved heterogeneity that induces spatial clustering. Allowing for nonlinearities reveals, in keeping with theoretical predictions, that the positive effect of agglomeration economies fades as the density of economic activities reaches some threshold value. However, no matter how dense the economic activity becomes, our results suggest that congestion costs never overcome positive agglomeration externalities.
Resumo:
In this paper, we introduce a Bayesian analysis for survival multivariate data in the presence of a covariate vector and censored observations. Different ""frailties"" or latent variables are considered to capture the correlation among the survival times for the same individual. We assume Weibull or generalized Gamma distributions considering right censored lifetime data. We develop the Bayesian analysis using Markov Chain Monte Carlo (MCMC) methods.
Resumo:
GPS tracking of mobile objects provides spatial and temporal data for a broad range of applications including traffic management and control, transportation routing and planning. Previous transport research has focused on GPS tracking data as an appealing alternative to travel diaries. Moreover, the GPS based data are gradually becoming a cornerstone for real-time traffic management. Tracking data of vehicles from GPS devices are however susceptible to measurement errors – a neglected issue in transport research. By conducting a randomized experiment, we assess the reliability of GPS based traffic data on geographical position, velocity, and altitude for three types of vehicles; bike, car, and bus. We find the geographical positioning reliable, but with an error greater than postulated by the manufacturer and a non-negligible risk for aberrant positioning. Velocity is slightly underestimated, whereas altitude measurements are unreliable.