913 resultados para alta risoluzione Trentino Alto Adige data-set climatologia temperatura giornaliera orografia complessa
Resumo:
As more diagnostic testing options become available to physicians, it becomes more difficult to combine various types of medical information together in order to optimize the overall diagnosis. To improve diagnostic performance, here we introduce an approach to optimize a decision-fusion technique to combine heterogeneous information, such as from different modalities, feature categories, or institutions. For classifier comparison we used two performance metrics: The receiving operator characteristic (ROC) area under the curve [area under the ROC curve (AUC)] and the normalized partial area under the curve (pAUC). This study used four classifiers: Linear discriminant analysis (LDA), artificial neural network (ANN), and two variants of our decision-fusion technique, AUC-optimized (DF-A) and pAUC-optimized (DF-P) decision fusion. We applied each of these classifiers with 100-fold cross-validation to two heterogeneous breast cancer data sets: One of mass lesion features and a much more challenging one of microcalcification lesion features. For the calcification data set, DF-A outperformed the other classifiers in terms of AUC (p < 0.02) and achieved AUC=0.85 +/- 0.01. The DF-P surpassed the other classifiers in terms of pAUC (p < 0.01) and reached pAUC=0.38 +/- 0.02. For the mass data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved AUC=0.94 +/- 0.01. Although for this data set there were no statistically significant differences among the classifiers' pAUC values (pAUC=0.57 +/- 0.07 to 0.67 +/- 0.05, p > 0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and 100% sensitivity (p < 0.04). In conclusion, decision fusion directly optimized clinically significant performance measures, such as AUC and pAUC, and sometimes outperformed two well-known machine-learning techniques when applied to two different breast cancer data sets.
Resumo:
BACKGROUND: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.
Resumo:
In this article, the buildingEXODUS (V1.1) evacuation model is described and discussed and attempts at qualitative and quantitative model validation are presented. The data set used for the validation is the Tsukuba pavilion evacuation data. This data set is of particular interest as the evacuation was influenced by external conditions, namely inclement weather. As part of the validation exercise, the sensitivity of the buildingEXODUS predictions to a range of variables and conditions is examined, including: exit flow capacity, occupant response times, and the impact of external conditions on the developing evacuation. The buildingEXODUS evacuation model was found to produce good qualitative and quantitative agreement with the experimental data.
Resumo:
Network analysis is distinguished from traditional social science by the dyadic nature of the standard data set. Whereas in traditional social science we study monadic attributes of individuals, in network analysis we study dyadic attributes of pairs of individuals. These dyadic attributes (e.g. social relations) may be represented in matrix form by a square 1-mode matrix. In contrast, the data in traditional social science are represented as 2-mode matrices. However, network analysis is not completely divorced from traditional social science, and often has occasion to collect and analyze 2-mode matrices. Furthermore, some of the methods developed in network analysis have uses in analysing non-network data. This paper presents and discusses ways of applying and interpreting traditional network analytic techniques to 2-mode data, as well as developing new techniques. Three areas are covered in detail: displaying 2-mode data as networks, detecting clusters and measuring centrality.
Resumo:
An historical data set, collected in 1958 by Southward and Crisp, was used as a baseline for detecting change in the abundances of species in the rocky intertidal of Ireland. In 2003, the abundances of each of 27 species was assessed using the same methodologies (ACFOR [which stands for the categories: abundant, common, frequent, occasional and rare] abundance scales) at 63 shores examined in the historical study. Comparison of the ACFOR data over a 45-year period, between the historical survey and re-survey, showed statistically significant changes in the abundances of 12 of the 27 species examined. Two species (one classed as northern and one introduced) increased significantly in abundance while ten species (five classed as northern, one classed as southern and four broadly distributed) decreased in abundance. The possible reasons for the changes in species abundances were assessed not only in the context of anthropogenic effects, such as climate change and commercial exploitation, but also of operator error. The error or differences recorded among operators (i.e. research scientists) when assessing species abundance using ACFOR categories was quantified on four shores. Significant change detected in three of the 12 species fell within the margin of operator error. This effect of operator may have also contributed to the results of no change in the other 15 species between the two census periods. It was not possible to determine the effect of operator on our results, which can increase the occurrence of a false positive (Type 1) or of a false negative (Type 2) outcome
Resumo:
As a response to public demand for a well-documented, quality controlled, publically available, global surface ocean carbon dioxide (CO2) data set, the international marine carbon science community developed the Surface Ocean CO2 Atlas (SOCAT). The first SOCAT product is a collection of 6.3 million quality controlled surface CO2 data from the global oceans and coastal seas, spanning four decades (1968–2007). The SOCAT gridded data presented here is the second data product to come from the SOCAT project. Recognizing that some groups may have trouble working with millions of measurements, the SOCAT gridded product was generated to provide a robust, regularly spaced CO2 fugacity (fCO2) product with minimal spatial and temporal interpolation, which should be easier to work with for many applications. Gridded SOCAT is rich with information that has not been fully explored yet (e.g., regional differences in the seasonal cycles), but also contains biases and limitations that the user needs to recognize and address (e.g., local influences on values in some coastal regions).
Resumo:
Interannual and seasonal trends of zooplankton abundance and species composition were compared between the Bongo net and Continuous Plankton Recorder (CPR) time series in the Gulf of Maine. Data from 5799 Bongo and 3118 CPR samples were compared from the years 1978–2006. The two programs use different sampling methods, with the Bongo time series composed of bimonthly vertically integrated samples from locations throughout the region, while the CPR was towed monthly at 10 m depth on a transect that bisects the region. It was found that there was a significant correlation between the interannual (r = 0.67, P < 0.01) and seasonal (r = 0.95, P < 0.01) variability of total zooplankton counts. Abundance rankings of individual taxa were highly correlated and temporal trends of dominant copepods were similar between samplers. Multivariate analysis also showed that both time series equally detected major shifts in community structure through time. However, absolute abundance levels were higher in the Bongo and temporal patterns for many of the less abundant taxa groups were not similar between the two devices. The different mesh sizes of the samplers probably caused some of the discrepancies; but diel migration patterns, damage to soft bodied animals and avoidance of the small CPR aperture by some taxa likely contributed to the catch differences between the two devices. Nonetheless, Bongo data presented here confirm the previously published patterns found in the CPR data set, and both show that the abundance increase of the 1990s has been followed by average to below average levels from 2002 to 06.
Resumo:
Novel techniques have been developed for increasing the value of cloud-affected sequences of Advanced Very High Resolution Radiometer (AVHRR) sea-surface temperature (SST) data and Sea-viewing Wide Field-of-view Sensor (SeaWiFS) ocean colour data for visualising dynamic physical and biological oceanic processes such as fronts, eddies and blooms. The proposed composite front map approach is to combine the location, strength and persistence of all fronts observed over several days into a single map, which allows intuitive interpretation of mesoscale structures. This method achieves a synoptic view without blurring dynamic features, an inherent problem with conventional time-averaging compositing methods. Objective validation confirms a significant improvement in feature visibility on composite maps compared to individual front maps. A further novel aspect is the automated detection of ocean colour fronts, correctly locating 96% of chlorophyll fronts in a test data set. A sizeable data set of 13,000 AVHRR and 1200 SeaWiFS scenes automatically processed using this technique is applied to the study of dynamic processes off the Iberian Peninsula such as mesoscale eddy generation, and many additional applications are identified. Front map animations provide a unique insight into the evolution of upwelling and eddies.
Resumo:
1.Understanding which environmental factors drive foraging preferences is critical for the development of effective management measures, but resource use patterns may emerge from processes that occur at different spatial and temporal scales. Direct observations of foraging are also especially challenging in marine predators, but passive acoustic techniques provide opportunities to study the behaviour of echolocating species over a range of scales. 2.We used an extensive passive acoustic data set to investigate the distribution and temporal dynamics of foraging in bottlenose dolphins using the Moray Firth (Scotland, UK). Echolocation buzzes were identified with a mixture model of detected echolocation inter-click intervals and used as a proxy of foraging activity. A robust modelling approach accounting for autocorrelation in the data was then used to evaluate which environmental factors were associated with the observed dynamics at two different spatial and temporal scales. 3.At a broad scale, foraging varied seasonally and was also affected by seabed slope and shelf-sea fronts. At a finer scale, we identified variation in seasonal use and local interactions with tidal processes. Foraging was best predicted at a daily scale, accounting for site specificity in the shape of the estimated relationships. 4.This study demonstrates how passive acoustic data can be used to understand foraging ecology in echolocating species and provides a robust analytical procedure for describing spatio-temporal patterns. Associations between foraging and environmental characteristics varied according to spatial and temporal scale, highlighting the need for a multi-scale approach. Our results indicate that dolphins respond to coarser scale temporal dynamics, but have a detailed understanding of finer-scale spatial distribution of resources.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
A problem with use of the geostatistical Kriging error for optimal sampling design is that the design does not adapt locally to the character of spatial variation. This is because a stationary variogram or covariance function is a parameter of the geostatistical model. The objective of this paper was to investigate the utility of non-stationary geostatistics for optimal sampling design. First, a contour data set of Wiltshire was split into 25 equal sub-regions and a local variogram was predicted for each. These variograms were fitted with models and the coefficients used in Kriging to select optimal sample spacings for each sub-region. Large differences existed between the designs for the whole region (based on the global variogram) and for the sub-regions (based on the local variograms). Second, a segmentation approach was used to divide a digital terrain model into separate segments. Segment-based variograms were predicted and fitted with models. Optimal sample spacings were then determined for the whole region and for the sub-regions. It was demonstrated that the global design was inadequate, grossly over-sampling some segments while under-sampling others.
Resumo:
The WASP (wide angle search for planets) project is an exoplanet transit survey that has been automatically taking wide field images since 2004. Two instruments, one in La Palma and the other in South Africa, continually monitor the night sky, building up light curves of millions of unique objects. These light curves are used to search for the characteristics of exoplanetary transits. This first public data release (DR1) of the WASP archive makes available all the light curve data and images from 2004 up to 2008 in both the Northern and Southern hemispheres. A web interface () to the data allows easy access over the Internet. The data set contains 3 631 972 raw images and 17 970 937 light curves. In total the light curves have 119 930 299 362 data points available between them.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
Resumo:
1) Executive Summary
Legislation (Autism Act NI, 2011), a cross-departmental strategy (Autism Strategy 2013-2020) and a first action plan (2013-2016) have been developed in Northern Ireland in order to support individuals and families affected by Autism Spectrum Disorder (ASD) without a prior thorough baseline assessment of need. At the same time, there are large existing data sets about the population in NI that had never been subjected to a secondary data analysis with regards to data on ASD. This report covers the first comprehensive secondary data analysis and thereby aims to inform future policy and practice.
Following a search of all existing, large-scale, regional or national data sets that were relevant to the lives of individuals and families affected by Autism Spectrum Disorder (ASD) in Northern Ireland, extensive secondary data analyses were carried out. The focus of these secondary data analyses was to distill any ASD related data from larger generic data sets. The findings are reported for each data set and follow a lifespan perspective, i.e., data related to children is reported first before data related to adults.
Key findings:
Autism Prevalence:
Of children born in 2000 in the UK,
• 0.9% (1:109) were reported to have ASD, when they were 5-year old in 2005;
• 1.8% (1:55) were reported to have ASD, when they were 7-years old in 2007;
• 3.5% (1:29) were reported to have ASD, when they were 11-year old in 2011.
In mainstream schools in Northern Ireland
• 1.2% of the children were reported to have ASD in 2006/07;
• 1.8% of the children were reported to have ASD in 2012/13.
Economic Deprivation:
• Families of children with autism (CWA) were 9%-18% worse off per week than families of children not on the autism spectrum (COA).
• Between 2006-2013 deprivation of CWA compared to COA nearly doubled as measured by eligibility for free school meals (from near 20 % to 37%)
• In 2006, CWA and COA experienced similar levels of deprivation (approx. 20%), by 2013, a considerable deprivation gap had developed, with CWA experienced 6% more deprivation than COA.
• Nearly 1/3 of primary school CWA lived in the most deprived areas in Northern Ireland.
• Nearly ½ of children with Asperger’s Syndrome who attended special school lived in the most deprived areas.
Unemployment:
• Mothers of CWA were 6% less likely to be employed than mothers of COA.
• Mothers of CWA earned 35%-56% less than mothers of COA.
• CWA were 9% less likely to live in two income families than COA.
Health:
• Pre-diagnosis, CWA were more likely than COA to have physical health problems, including walking on level ground, speech and language, hearing, eyesight, and asthma.
• Aged 3 years of age CWA experienced poorer emotional and social health than COA, this difference increased significantly by the time they were 7 years of age.
• Mothers of young CWA had lower levels of life satisfaction and poorer mental health than mothers of young COA.
Education:
• In mainstream education, children with ASD aged 11-16 years reported less satisfaction with their social relationships than COA.
• Younger children with ASD (aged 5 and 7 years) were less likely to enjoy school, were bullied more, and were more reluctant to attend school than COA.
• CWA attended school 2-3 weeks less than COA .
• Children with Asperger’s Syndrome in special schools missed the equivalent of 8-13 school days more than children with Asperger’s Syndrome in mainstream schools.
• Children with ASD attending mainstream schooling were less likely to gain 5+ GCSEs A*-C or subsequently attend university.
Further and Higher Education:
• Enrolment rates for students with ASD have risen in Further Education (FE), from 0% to 0.7%.
• Enrolment rates for students with ASD have risen in Higher Education (HE), from 0.28% to 0.45%.
• Students with ASD chose to study different subjects than students without ASD, although other factors, e.g., gender, age etc. may have played a part in subject selection.
• Students with ASD from NI were more likely than students without ASD to choose Northern Irish HE Institutions rather than study outside NI.
Participation in adult life and employment:
• A small number of adults with ASD (n=99) have benefitted from DES employment provision over the past 12 years.
• It is unknown how many adults with ASD have received employment support elsewhere (e.g. Steps to Work).
•
Awareness and Attitudes in the General Population:
• In both the 2003 and 2012 NI Life and Times Survey (NILTS), NI public reported positive attitudes towards the inclusion of children with ASD in mainstream education (see also BASE Project Vol. 2).
Gap Analysis Recommendations:
This was the first comprehensive secondary analysis with regards to ASD of existing large-scale data sets in Northern Ireland. Data gaps were identified and further replications would benefit from the following data inclusion:
• ASD should be recorded routinely in the following datasets:
o Census;
o Northern Ireland Survey of Activity Limitation (NISALD);
o Training for Success/Steps to work; Steps to Success;
o Travel survey;
o Hate crime; and
o Labour Force Survey.
• Data should be collected on the destinations/qualifications of special school leavers.
• NILT Survey autism module should be repeated in 5 years time (2017) (see full report of 1st NILT Survey autism module 2012 in BASE Project Report Volume 2).
• General public attitudes and awareness should be assessed for children and young people, using the Young Life and Times Survey (YLT) and the Kids Life and Times Survey (KLT); (this work is underway, Dillenburger, McKerr, Schubolz, & Lloyd, 2014-2015).