Biblioteca Digital

949 resultados para monotone missing data

Attribute reduction and missing value imputing with ANN: prediction of learning disabilities

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Learning disability (LD) is a neurological condition that affects a child’s brain and impairs his ability to carry out one or many specific tasks. LD affects about 10% of children enrolled in schools. There is no cure for learning disabilities and they are lifelong. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Just as there are many different types of LDs, there are a variety of tests that may be done to pinpoint the problem The information gained from an evaluation is crucial for finding out how the parents and the school authorities can provide the best possible learning environment for child. This paper proposes a new approach in artificial neural network (ANN) for identifying LD in children at early stages so as to solve the problems faced by them and to get the benefits to the students, their parents and school authorities. In this study, we propose a closest fit algorithm data preprocessing with ANN classification to handle missing attribute values. This algorithm imputes the missing values in the preprocessing stage. Ignoring of missing attribute values is a common trend in all classifying algorithms. But, in this paper, we use an algorithm in a systematic approach for classification, which gives a satisfactory result in the prediction of LD. It acts as a tool for predicting the LD accurately, and good information of the child is made available to the concerned

Nichtüberlappende Gebietszerlegungsmethoden für lineare und quasilineare (monotone und nichtmonotone) Probleme

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In dieser Arbeit werden nichtüberlappende Gebietszerlegungsmethoden einerseits hinsichtlich der zu lösenden Problemklassen verallgemeinert und andererseits in bisher nicht untersuchten Kontexten betrachtet. Dabei stehen funktionalanalytische Untersuchungen zur Wohldefiniertheit, eindeutigen Lösbarkeit und Konvergenz im Vordergrund. Im ersten Teil werden lineare elliptische Dirichlet-Randwertprobleme behandelt, wobei neben Problemen mit dominantem Hauptteil auch solche mit singulärer Störung desselben, wie konvektions- oder reaktionsdominante Probleme zugelassen sind. Der zweite Teil befasst sich mit (gleichmäßig) monotonen koerziven quasilinearen elliptischen Dirichlet-Randwertproblemen. In beiden Fällen wird das Lipschitz-Gebiet in endlich viele Lipschitz-Teilgebiete zerlegt, wobei insbesondere Kreuzungspunkte und Teilgebiete ohne Außenrand zugelassen sind. Anschließend werden Transmissionsprobleme mit frei wählbaren $L^{\infty}$-Parameterfunktionen hergeleitet, wobei die Konormalenableitungen als Funktionale auf geeigneten Funktionenräumen über den Teilrändern ($H_{00}^{1/2}(\Gamma)$) interpretiert werden. Die iterative Lösung dieser Transmissionsprobleme mit einem Ansatz von Deng führt auf eine Substrukturierungsmethode mit Robin-artigen Transmissionsbedingungen, bei der eine Auswertung der Konormalenableitungen aufgrund einer geschickten Aufdatierung der Robin-Daten nicht notwendig ist (insbesondere ist die bekannte Robin-Robin-Methode von Lions als Spezialfall enthalten). Die Konvergenz bezüglich einer partitionierten $H^1$-Norm wird für beide Problemklassen gezeigt. Dabei werden keine über $H^1$ hinausgehende Regularitätsforderungen an die Lösungen gestellt und die Gebiete müssen keine zusätzlichen Glattheitsvoraussetzungen erfüllen. Im letzten Kapitel werden nichtmonotone koerzive quasilineare Probleme untersucht, wobei das Zugrunde liegende Gebiet nur in zwei Lipschitz-Teilgebiete zerlegt sein soll. Das zugehörige nichtlineare Transmissionsproblem wird durch Kirchhoff-Transformation in lineare Teilprobleme mit nichtlinearen Kopplungsbedingungen überführt. Ein optimierungsbasierter Lösungsansatz, welcher einen geeigneten Abstand der rücktransformierten Dirichlet-Daten der linearen Teilprobleme auf den Teilrändern minimiert, führt auf ein optimales Kontrollproblem. Die dabei entstehenden regularisierten freien Minimierungsprobleme werden mit Hilfe eines Gradientenverfahrens unter minimalen Glattheitsforderungen an die Nichtlinearitäten gelöst. Unter zusätzlichen Glattheitsvoraussetzungen an die Nichtlinearitäten und weiteren technischen Voraussetzungen an die Lösung des quasilinearen Ausgangsproblems, kann zudem die quadratische Konvergenz des Newton-Verfahrens gesichert werden.

Modelling structural zeros in compositional data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This analysis was stimulated by the real data analysis problem of household expenditure data. The full dataset contains expenditure data for a sample of 1224 households. The expenditure is broken down at 2 hierarchical levels: 9 major levels (e.g. housing, food, utilities etc.) and 92 minor levels. There are also 5 factors and 5 covariates at the household level. Not surprisingly, there are a small number of zeros at the major level, but many zeros at the minor level. The question is how best to model the zeros. Clearly, models that try to add a small amount to the zero terms are not appropriate in general as at least some of the zeros are clearly structural, e.g. alcohol/tobacco for households that are teetotal. The key question then is how to build suitable conditional models. For example, is the sub-composition of spending excluding alcohol/tobacco similar for teetotal and non-teetotal households? In other words, we are looking for sub-compositional independence. Also, what determines whether a household is teetotal? Can we assume that it is independent of the composition? In general, whether teetotal will clearly depend on the household level variables, so we need to be able to model this dependence. The other tricky question is that with zeros on more than one component, we need to be able to model dependence and independence of zeros on the different components. Lastly, while some zeros are structural, others may not be, for example, for expenditure on durables, it may be chance as to whether a particular household spends money on durables within the sample period. This would clearly be distinguishable if we had longitudinal data, but may still be distinguishable by looking at the distribution, on the assumption that random zeros will usually be for situations where any non-zero expenditure is not small. While this analysis is based on around economic data, the ideas carry over to many other situations, including geological data, where minerals may be missing for structural reasons (similar to alcohol), or missing because they occur only in random regions which may be missed in a sample (similar to the durables)

Markov chain montecarlo method applied to rounding zeros of compositional data: first approach

Relevância:

30.00% 30.00%

Publicador:

Resumo:

As stated in Aitchison (1986), a proper study of relative variation in a compositional data set should be based on logratios, and dealing with logratios excludes dealing with zeros. Nevertheless, it is clear that zero observations might be present in real data sets, either because the corresponding part is completely absent –essential zeros– or because it is below detection limit –rounded zeros. Because the second kind of zeros is usually understood as “a trace too small to measure”, it seems reasonable to replace them by a suitable small value, and this has been the traditional approach. As stated, e.g. by Tauber (1999) and by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000), the principal problem in compositional data analysis is related to rounded zeros. One should be careful to use a replacement strategy that does not seriously distort the general structure of the data. In particular, the covariance structure of the involved parts –and thus the metric properties– should be preserved, as otherwise further analysis on subpopulations could be misleading. Following this point of view, a non-parametric imputation method is introduced in Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2000). This method is analyzed in depth by Martín-Fernández, Barceló-Vidal, and Pawlowsky-Glahn (2003) where it is shown that the theoretical drawbacks of the additive zero replacement method proposed in Aitchison (1986) can be overcome using a new multiplicative approach on the non-zero parts of a composition. The new approach has reasonable properties from a compositional point of view. In particular, it is “natural” in the sense that it recovers the “true” composition if replacement values are identical to the missing values, and it is coherent with the basic operations on the simplex. This coherence implies that the covariance structure of subcompositions with no zeros is preserved. As a generalization of the multiplicative replacement, in the same paper a substitution method for missing values on compositional data sets is introduced

Compositional Data Analysis with R

Relevância:

30.00% 30.00%

Publicador:

Resumo:

R from http://www.r-project.org/ is ‘GNU S’ – a language and environment for statistical computing and graphics. The environment in which many classical and modern statistical techniques have been implemented, but many are supplied as packages. There are 8 standard packages and many more are available through the cran family of Internet sites http://cran.r-project.org . We started to develop a library of functions in R to support the analysis of mixtures and our goal is a MixeR package for compositional data analysis that provides support for operations on compositions: perturbation and power multiplication, subcomposition with or without residuals, centering of the data, computing Aitchison’s, Euclidean, Bhattacharyya distances, compositional Kullback-Leibler divergence etc. graphical presentation of compositions in ternary diagrams and tetrahedrons with additional features: barycenter, geometric mean of the data set, the percentiles lines, marking and coloring of subsets of the data set, theirs geometric means, notation of individual data in the set . . . dealing with zeros and missing values in compositional data sets with R procedures for simple and multiplicative replacement strategy, the time series analysis of compositional data. We’ll present the current status of MixeR development and illustrate its use on selected data sets

Preparation of consistent soil data sets for modelling purposes: Secondary SOTER data for four case study areas

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The common GIS-based approach to regional analyses of soil organic carbon (SOC) stocks and changes is to define geographic layers for which unique sets of driving variables are derived, which include land use, climate, and soils. These GIS layers, with their associated attribute data, can then be fed into a range of empirical and dynamic models. Common methodologies for collating and formatting regional data sets on land use, climate, and soils were adopted for the project Assessment of Soil Organic Carbon Stocks and Changes at National Scale (GEFSOC). This permitted the development of a uniform protocol for handling the various input for the dynamic GEFSOC Modelling System. Consistent soil data sets for Amazon-Brazil, the Indo-Gangetic Plains (IGP) of India, Jordan and Kenya, the case study areas considered in the GEFSOC project, were prepared using methodologies developed for the World Soils and Terrain Database (SOTER). The approach involved three main stages: (1) compiling new soil geographic and attribute data in SOTER format; (2) using expert estimates and common sense to fill selected gaps in the measured or primary data; (3) using a scheme of taxonomy-based pedotransfer rules and expert-rules to derive soil parameter estimates for similar soil units with missing soil analytical data. The most appropriate approach varied from country to country, depending largely on the overall accessibility and quality of the primary soil data available in the case study areas. The secondary SOTER data sets discussed here are appropriate for a wide range of environmental applications at national scale. These include agro-ecological zoning, land evaluation, modelling of soil C stocks and changes, and studies of soil vulnerability to pollution. Estimates of national-scale stocks of SOC, calculated using SOTER methods, are presented as a first example of database application. Independent estimates of SOC stocks are needed to evaluate the outcome of the GEFSOC Modelling System for current conditions of land use and climate. (C) 2007 Elsevier B.V. All rights reserved.

Automated pre-processing strategies for species occurrence data used in biodiversity modelling

Relevância:

30.00% 30.00%

Publicador:

Resumo:

To construct Biodiversity richness maps from Environmental Niche Models (ENMs) of thousands of species is time consuming. A separate species occurrence data pre-processing phase enables the experimenter to control test AUC score variance due to species dataset size. Besides, removing duplicate occurrences and points with missing environmental data, we discuss the need for coordinate precision, wide dispersion, temporal and synonymity filters. After species data filtering, the final task of a pre-processing phase should be the automatic generation of species occurrence datasets which can then be directly ’plugged-in’ to the ENM. A software application capable of carrying out all these tasks will be a valuable time-saver particularly for large scale biodiversity studies.

Mixture models for capture-recapture count data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The contribution investigates the problem of estimating the size of a population, also known as the missing cases problem. Suppose a registration system is targeting to identify all cases having a certain characteristic such as a specific disease (cancer, heart disease, ...), disease related condition (HIV, heroin use, ...) or a specific behavior (driving a car without license). Every case in such a registration system has a certain notification history in that it might have been identified several times (at least once) which can be understood as a particular capture-recapture situation. Typically, cases are left out which have never been listed at any occasion, and it is this frequency one wants to estimate. In this paper modelling is concentrating on the counting distribution, e.g. the distribution of the variable that counts how often a given case has been identified by the registration system. Besides very simple models like the binomial or Poisson distribution, finite (nonparametric) mixtures of these are considered providing rather flexible modelling tools. Estimation is done using maximum likelihood by means of the EM algorithm. A case study on heroin users in Bangkok in the year 2001 is completing the contribution.

Extending Zelterman’s approach for robust estimation of population size to zero-truncated clustered data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Estimation of population size with missing zero-class is an important problem that is encountered in epidemiological assessment studies. Fitting a Poisson model to the observed data by the method of maximum likelihood and estimation of the population size based on this fit is an approach that has been widely used for this purpose. In practice, however, the Poisson assumption is seldom satisfied. Zelterman (1988) has proposed a robust estimator for unclustered data that works well in a wide class of distributions applicable for count data. In the work presented here, we extend this estimator to clustered data. The estimator requires fitting a zero-truncated homogeneous Poisson model by maximum likelihood and thereby using a Horvitz-Thompson estimator of population size. This was found to work well, when the data follow the hypothesized homogeneous Poisson model. However, when the true distribution deviates from the hypothesized model, the population size was found to be underestimated. In the search of a more robust estimator, we focused on three models that use all clusters with exactly one case, those clusters with exactly two cases and those with exactly three cases to estimate the probability of the zero-class and thereby use data collected on all the clusters in the Horvitz-Thompson estimator of population size. Loss in efficiency associated with gain in robustness was examined based on a simulation study. As a trade-off between gain in robustness and loss in efficiency, the model that uses data collected on clusters with at most three cases to estimate the probability of the zero-class was found to be preferred in general. In applications, we recommend obtaining estimates from all three models and making a choice considering the estimates from the three models, robustness and the loss in efficiency. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

PDM 2006: Proceedings of the International Workshop on Data Mining

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.

An Earth Observation Land Data Assimilation System (EO-LDAS)

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Current methods for estimating vegetation parameters are generally sub-optimal in the way they exploit information and do not generally consider uncertainties. We look forward to a future where operational dataassimilation schemes improve estimates by tracking land surface processes and exploiting multiple types of observations. Dataassimilation schemes seek to combine observations and models in a statistically optimal way taking into account uncertainty in both, but have not yet been much exploited in this area. The EO-LDAS scheme and prototype, developed under ESA funding, is designed to exploit the anticipated wealth of data that will be available under GMES missions, such as the Sentinel family of satellites, to provide improved mapping of land surface biophysical parameters. This paper describes the EO-LDAS implementation, and explores some of its core functionality. EO-LDAS is a weak constraint variational dataassimilationsystem. The prototype provides a mechanism for constraint based on a prior estimate of the state vector, a linear dynamic model, and EarthObservationdata (top-of-canopy reflectance here). The observation operator is a non-linear optical radiative transfer model for a vegetation canopy with a soil lower boundary, operating over the range 400 to 2500 nm. Adjoint codes for all model and operator components are provided in the prototype by automatic differentiation of the computer codes. In this paper, EO-LDAS is applied to the problem of daily estimation of six of the parameters controlling the radiative transfer operator over the course of a year (> 2000 state vector elements). Zero and first order process model constraints are implemented and explored as the dynamic model. The assimilation estimates all state vector elements simultaneously. This is performed in the context of a typical Sentinel-2 MSI operating scenario, using synthetic MSI observations simulated with the observation operator, with uncertainties typical of those achieved by optical sensors supposed for the data. The experiments consider a baseline state vector estimation case where dynamic constraints are applied, and assess the impact of dynamic constraints on the a posteriori uncertainties. The results demonstrate that reductions in uncertainty by a factor of up to two might be obtained by applying the sorts of dynamic constraints used here. The hyperparameter (dynamic model uncertainty) required to control the assimilation are estimated by a cross-validation exercise. The result of the assimilation is seen to be robust to missing observations with quite large data gaps.

Is missing orographic gravity wave drag near 60°S the cause of the stratospheric zonal wind biases in chemistry–climate models?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Nearly all chemistry–climate models (CCMs) have a systematic bias of a delayed springtime breakdown of the Southern Hemisphere (SH) stratospheric polar vortex, implying insufficient stratospheric wave drag. In this study the Canadian Middle Atmosphere Model (CMAM) and the CMAM Data Assimilation System (CMAM-DAS) are used to investigate the cause of this bias. Zonal wind analysis increments from CMAMDAS reveal systematic negative values in the stratosphere near 608S in winter and early spring. These are interpreted as indicating a bias in the model physics, namely, missing gravity wave drag (GWD). The negative analysis increments remain at a nearly constant height during winter and descend as the vortex weakens, much like orographic GWD. This region is also where current orographic GWD parameterizations have a gap in wave drag, which is suggested to be unrealistic because of missing effects in those parameterizations. These findings motivate a pair of free-runningCMAMsimulations to assess the impact of extra orographicGWDat 608S. The control simulation exhibits the cold-pole bias and delayed vortex breakdown seen in the CCMs. In the simulation with extra GWD, the cold-pole bias is significantly reduced and the vortex breaks down earlier. Changes in resolved wave drag in the stratosphere also occur in response to the extra GWD, which reduce stratospheric SH polar-cap temperature biases in late spring and early summer. Reducing the dynamical biases, however, results in degraded Antarctic column ozone. This suggests that CCMs that obtain realistic column ozone in the presence of an overly strong and persistent vortex may be doing so through compensating errors.

The missing pillar: the creativity theory of knowledge spillover entrepreneurship

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Knowledge spillover theory of entrepreneurship and the prevailing theory of economic growth treat opportunities as endogenous and generally focus on opportunity recognition by entrepreneurs. New knowledge created endogenously results in knowledge spillovers enabling inventors and entrepreneurs to commercialize it. This article discusses that knowledge spillover entrepreneurship depends not only on ordinary human capital, but more importantly also on creativity embodied in creative individuals and diverse urban environments that attract creative classes. This might result in self-selection of creative individuals into entrepreneurship or enable entrepreneurs to recognize creativity and commercialize it. This creativity theory of knowledge spillover entrepreneurship is tested utilizing data on European cities.

Missing steps in a staircase: a qualitative study of the perspectives of key stakeholders on the use of adaptive designs in confirmatory trials

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Despite the promising benefits of adaptive designs (ADs), their routine use, especially in confirmatory trials, is lagging behind the prominence given to them in the statistical literature. Much of the previous research to understand barriers and potential facilitators to the use of ADs has been driven from a pharmaceutical drug development perspective, with little focus on trials in the public sector. In this paper, we explore key stakeholders’ experiences, perceptions and views on barriers and facilitators to the use of ADs in publicly funded confirmatory trials. Methods Semi-structured, in-depth interviews of key stakeholders in clinical trials research (CTU directors, funding board and panel members, statisticians, regulators, chief investigators, data monitoring committee members and health economists) were conducted through telephone or face-to-face sessions, predominantly in the UK. We purposively selected participants sequentially to optimise maximum variation in views and experiences. We employed the framework approach to analyse the qualitative data. Results We interviewed 27 participants. We found some of the perceived barriers to be: lack of knowledge and experience coupled with paucity of case studies, lack of applied training, degree of reluctance to use ADs, lack of bridge funding and time to support design work, lack of statistical expertise, some anxiety about the impact of early trial stopping on researchers’ employment contracts, lack of understanding of acceptable scope of ADs and when ADs are appropriate, and statistical and practical complexities. Reluctance to use ADs seemed to be influenced by: therapeutic area, unfamiliarity, concerns about their robustness in decision-making and acceptability of findings to change practice, perceived complexities and proposed type of AD, among others. Conclusions There are still considerable multifaceted, individual and organisational obstacles to be addressed to improve uptake, and successful implementation of ADs when appropriate. Nevertheless, inferred positive change in attitudes and receptiveness towards the appropriate use of ADs by public funders are supportive and are a stepping stone for the future utilisation of ADs by researchers.

Missing out? Challenges to hearing the views of all children on the barriers and supports to learning

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Children's views are essential to enabling schools to fulfil their duties under the Special Educational Needs and Disability Act 2001 and create inclusive learning environments. Arguably children are the best source of information about the ways in which schools support their learning and what barriers they encounter. Accessing this requires a deeper level of reflection than simply asking what children find difficult. It is also a challenge to ensure that the views of all children contribute including those who find communication difficult. Development work in five schools is drawn on to analyse the ways in which teachers used suggestions for three interview activities. The data reveals the strengths and limitations of different ways of supporting the communication process.

«
1
2
...
14
15
16
17
18
19
20
...
63
64
»