39 resultados para alta risoluzione Trentino Alto Adige data-set climatologia temperatura giornaliera orografia complessa
em QUB Research Portal - Research Directory and Institutional Repository for Queen's University Belfast
Resumo:
We propose an exchange rate model that is a hybrid of the conventional specification with monetary fundamentals and the Evans–Lyons microstructure approach. We estimate a model augmented with order flow variables, using a unique data set: almost 100 monthly observations on interdealer order flow on dollar/euro and dollar/yen. The augmented macroeconomic, or “hybrid,” model exhibits greater in-sample stability and out of sample forecasting improvement vis-à-vis the basic macroeconomic and random walk specifications.
Resumo:
To overcome the weak evidence base coming from often poor and insufficient clinical research in older people, a minimum data set to achieve harmonisation is highly advisable. This will lead to uniform nomenclature and to the standardisation of the assessment tools. Our primary objective was to develop a Geriatric Minimum Data Set (GMDS) for clinical research.
Resumo:
his paper considers a problem of identification for a high dimensional nonlinear non-parametric system when only a limited data set is available. The algorithms are proposed for this purpose which exploit the relationship between the input variables and the output and further the inter-dependence of input variables so that the importance of the input variables can be established. A key to these algorithms is the non-parametric two stage input selection algorithm.
Resumo:
Support vector machine (SVM) is a powerful technique for data classification. Despite of its good theoretic foundations and high classification accuracy, normal SVM is not suitable for classification of large data sets, because the training complexity of SVM is highly dependent on the size of data set. This paper presents a novel SVM classification approach for large data sets by using minimum enclosing ball clustering. After the training data are partitioned by the proposed clustering method, the centers of the clusters are used for the first time SVM classification. Then we use the clusters whose centers are support vectors or those clusters which have different classes to perform the second time SVM classification. In this stage most data are removed. Several experimental results show that the approach proposed in this paper has good classification accuracy compared with classic SVM while the training is significantly faster than several other SVM classifiers.
Resumo:
A problem with use of the geostatistical Kriging error for optimal sampling design is that the design does not adapt locally to the character of spatial variation. This is because a stationary variogram or covariance function is a parameter of the geostatistical model. The objective of this paper was to investigate the utility of non-stationary geostatistics for optimal sampling design. First, a contour data set of Wiltshire was split into 25 equal sub-regions and a local variogram was predicted for each. These variograms were fitted with models and the coefficients used in Kriging to select optimal sample spacings for each sub-region. Large differences existed between the designs for the whole region (based on the global variogram) and for the sub-regions (based on the local variograms). Second, a segmentation approach was used to divide a digital terrain model into separate segments. Segment-based variograms were predicted and fitted with models. Optimal sample spacings were then determined for the whole region and for the sub-regions. It was demonstrated that the global design was inadequate, grossly over-sampling some segments while under-sampling others.
Resumo:
The WASP (wide angle search for planets) project is an exoplanet transit survey that has been automatically taking wide field images since 2004. Two instruments, one in La Palma and the other in South Africa, continually monitor the night sky, building up light curves of millions of unique objects. These light curves are used to search for the characteristics of exoplanetary transits. This first public data release (DR1) of the WASP archive makes available all the light curve data and images from 2004 up to 2008 in both the Northern and Southern hemispheres. A web interface () to the data allows easy access over the Internet. The data set contains 3 631 972 raw images and 17 970 937 light curves. In total the light curves have 119 930 299 362 data points available between them.
Resumo:
In many applications in applied statistics researchers reduce the complexity of a data set by combining a group of variables into a single measure using factor analysis or an index number. We argue that such compression loses information if the data actually has high dimensionality. We advocate the use of a non-parametric estimator, commonly used in physics (the Takens estimator), to estimate the correlation dimension of the data prior to compression. The advantage of this approach over traditional linear data compression approaches is that the data does not have to be linearized. Applying our ideas to the United Nations Human Development Index we find that the four variables that are used in its construction have dimension three and the index loses information.
Resumo:
High-dimensional gene expression data provide a rich source of information because they capture the expression level of genes in dynamic states that reflect the biological functioning of a cell. For this reason, such data are suitable to reveal systems related properties inside a cell, e.g., in order to elucidate molecular mechanisms of complex diseases like breast or prostate cancer. However, this is not only strongly dependent on the sample size and the correlation structure of a data set, but also on the statistical hypotheses tested. Many different approaches have been developed over the years to analyze gene expression data to (I) identify changes in single genes, (II) identify changes in gene sets or pathways, and (III) identify changes in the correlation structure in pathways. In this paper, we review statistical methods for all three types of approaches, including subtypes, in the context of cancer data and provide links to software implementations and tools and address also the general problem of multiple hypotheses testing. Further, we provide recommendations for the selection of such analysis methods.
Resumo:
1) Executive Summary
Legislation (Autism Act NI, 2011), a cross-departmental strategy (Autism Strategy 2013-2020) and a first action plan (2013-2016) have been developed in Northern Ireland in order to support individuals and families affected by Autism Spectrum Disorder (ASD) without a prior thorough baseline assessment of need. At the same time, there are large existing data sets about the population in NI that had never been subjected to a secondary data analysis with regards to data on ASD. This report covers the first comprehensive secondary data analysis and thereby aims to inform future policy and practice.
Following a search of all existing, large-scale, regional or national data sets that were relevant to the lives of individuals and families affected by Autism Spectrum Disorder (ASD) in Northern Ireland, extensive secondary data analyses were carried out. The focus of these secondary data analyses was to distill any ASD related data from larger generic data sets. The findings are reported for each data set and follow a lifespan perspective, i.e., data related to children is reported first before data related to adults.
Key findings:
Autism Prevalence:
Of children born in 2000 in the UK,
• 0.9% (1:109) were reported to have ASD, when they were 5-year old in 2005;
• 1.8% (1:55) were reported to have ASD, when they were 7-years old in 2007;
• 3.5% (1:29) were reported to have ASD, when they were 11-year old in 2011.
In mainstream schools in Northern Ireland
• 1.2% of the children were reported to have ASD in 2006/07;
• 1.8% of the children were reported to have ASD in 2012/13.
Economic Deprivation:
• Families of children with autism (CWA) were 9%-18% worse off per week than families of children not on the autism spectrum (COA).
• Between 2006-2013 deprivation of CWA compared to COA nearly doubled as measured by eligibility for free school meals (from near 20 % to 37%)
• In 2006, CWA and COA experienced similar levels of deprivation (approx. 20%), by 2013, a considerable deprivation gap had developed, with CWA experienced 6% more deprivation than COA.
• Nearly 1/3 of primary school CWA lived in the most deprived areas in Northern Ireland.
• Nearly ½ of children with Asperger’s Syndrome who attended special school lived in the most deprived areas.
Unemployment:
• Mothers of CWA were 6% less likely to be employed than mothers of COA.
• Mothers of CWA earned 35%-56% less than mothers of COA.
• CWA were 9% less likely to live in two income families than COA.
Health:
• Pre-diagnosis, CWA were more likely than COA to have physical health problems, including walking on level ground, speech and language, hearing, eyesight, and asthma.
• Aged 3 years of age CWA experienced poorer emotional and social health than COA, this difference increased significantly by the time they were 7 years of age.
• Mothers of young CWA had lower levels of life satisfaction and poorer mental health than mothers of young COA.
Education:
• In mainstream education, children with ASD aged 11-16 years reported less satisfaction with their social relationships than COA.
• Younger children with ASD (aged 5 and 7 years) were less likely to enjoy school, were bullied more, and were more reluctant to attend school than COA.
• CWA attended school 2-3 weeks less than COA .
• Children with Asperger’s Syndrome in special schools missed the equivalent of 8-13 school days more than children with Asperger’s Syndrome in mainstream schools.
• Children with ASD attending mainstream schooling were less likely to gain 5+ GCSEs A*-C or subsequently attend university.
Further and Higher Education:
• Enrolment rates for students with ASD have risen in Further Education (FE), from 0% to 0.7%.
• Enrolment rates for students with ASD have risen in Higher Education (HE), from 0.28% to 0.45%.
• Students with ASD chose to study different subjects than students without ASD, although other factors, e.g., gender, age etc. may have played a part in subject selection.
• Students with ASD from NI were more likely than students without ASD to choose Northern Irish HE Institutions rather than study outside NI.
Participation in adult life and employment:
• A small number of adults with ASD (n=99) have benefitted from DES employment provision over the past 12 years.
• It is unknown how many adults with ASD have received employment support elsewhere (e.g. Steps to Work).
•
Awareness and Attitudes in the General Population:
• In both the 2003 and 2012 NI Life and Times Survey (NILTS), NI public reported positive attitudes towards the inclusion of children with ASD in mainstream education (see also BASE Project Vol. 2).
Gap Analysis Recommendations:
This was the first comprehensive secondary analysis with regards to ASD of existing large-scale data sets in Northern Ireland. Data gaps were identified and further replications would benefit from the following data inclusion:
• ASD should be recorded routinely in the following datasets:
o Census;
o Northern Ireland Survey of Activity Limitation (NISALD);
o Training for Success/Steps to work; Steps to Success;
o Travel survey;
o Hate crime; and
o Labour Force Survey.
• Data should be collected on the destinations/qualifications of special school leavers.
• NILT Survey autism module should be repeated in 5 years time (2017) (see full report of 1st NILT Survey autism module 2012 in BASE Project Report Volume 2).
• General public attitudes and awareness should be assessed for children and young people, using the Young Life and Times Survey (YLT) and the Kids Life and Times Survey (KLT); (this work is underway, Dillenburger, McKerr, Schubolz, & Lloyd, 2014-2015).
Resumo:
In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
Outperformance in exchange-traded fund pricing deviations: Generalized control of data snooping bias
Resumo:
An investigation into exchange-traded fund (ETF) outperforrnance during the period 2008-2012 is undertaken utilizing a data set of 288 U.S. traded securities. ETFs are tested for net asset value (NAV) premium, underlying index and market benchmark outperformance, with Sharpe, Treynor, and Sortino ratios employed as risk-adjusted performance measures. A key contribution is the application of an innovative generalized stepdown procedure in controlling for data snooping bias. We find that a large proportion of optimized replication and debt asset class ETFs display risk-adjusted premiums with energy and precious metals focused funds outperforming the S&P 500 market benchmark.
Resumo:
This paper presents a novel method of audio-visual fusion for person identification where both the speech and facial modalities may be corrupted, and there is a lack of prior knowledge about the corruption. Furthermore, we assume there is a limited amount of training data for each modality (e.g., a short training speech segment and a single training facial image for each person). A new representation and a modified cosine similarity are introduced for combining and comparing bimodal features with limited training data as well as vastly differing data rates and feature sizes. Optimal feature selection and multicondition training are used to reduce the mismatch between training and testing, thereby making the system robust to unknown bimodal corruption. Experiments have been carried out on a bimodal data set created from the SPIDRE and AR databases with variable noise corruption of speech and occlusion in the face images. The new method has demonstrated improved recognition accuracy.
Resumo:
This paper investigates the gene selection problem for microarray data with small samples and variant correlation. Most existing algorithms usually require expensive computational effort, especially under thousands of gene conditions. The main objective of this paper is to effectively select the most informative genes from microarray data, while making the computational expenses affordable. This is achieved by proposing a novel forward gene selection algorithm (FGSA). To overcome the small samples' problem, the augmented data technique is firstly employed to produce an augmented data set. Taking inspiration from other gene selection methods, the L2-norm penalty is then introduced into the recently proposed fast regression algorithm to achieve the group selection ability. Finally, by defining a proper regression context, the proposed method can be fast implemented in the software, which significantly reduces computational burden. Both computational complexity analysis and simulation results confirm the effectiveness of the proposed algorithm in comparison with other approaches
Resumo:
The proportion of elderly in the population has dramatically increased and will continue to do so for at least the next 50 years. Medical resources throughout the world are feeling the added strain of the increasing proportion of elderly in the population. The effective care of elderly patients in hospitals may be enhanced by accurately modelling the length of stay of the patients in hospital and the associated costs involved. This paper examines previously developed models for patient length of stay in hospital and describes the recently developed conditional phase-type distribution (C-Ph) to model patient duration of stay in relation to explanatory patient variables. The Clinics data set was used to demonstrate the C-Ph methodology. The resulting model highlighted a strong relationship between Barthel grade, patient outcome and length of stay showing various groups of patient behaviour. The patients who stay in hospital for a very long time are usually those that consume the largest amount of hospital resources. These have been identified as the patients whose resulting outcome is transfer. Overall, the majority of transfer patients spend a considerably longer period of time in hospital compared to patients who die or are discharged home. The C-Ph model has the potential for considering costs where different costs are attached to the various phases or subgroups of patients and the anticipated cost of care estimated in advance. It is hoped that such a method will lead to the successful identification of the most cost effective case-mix management of the hospital ward.