797 resultados para Educational data mining
Resumo:
Accurate seasonal to interannual streamflow forecasts based on climate information are critical for optimal management and operation of water resources systems. Considering most water supply systems are multipurpose, operating these systems to meet increasing demand under the growing stresses of climate variability and climate change, population and economic growth, and environmental concerns could be very challenging. This study was to investigate improvement in water resources systems management through the use of seasonal climate forecasts. Hydrological persistence (streamflow and precipitation) and large-scale recurrent oceanic-atmospheric patterns such as the El Niño/Southern Oscillation (ENSO), Pacific Decadal Oscillation (PDO), North Atlantic Oscillation (NAO), the Atlantic Multidecadal Oscillation (AMO), the Pacific North American (PNA), and customized sea surface temperature (SST) indices were investigated for their potential to improve streamflow forecast accuracy and increase forecast lead-time in a river basin in central Texas. First, an ordinal polytomous logistic regression approach is proposed as a means of incorporating multiple predictor variables into a probabilistic forecast model. Forecast performance is assessed through a cross-validation procedure, using distributions-oriented metrics, and implications for decision making are discussed. Results indicate that, of the predictors evaluated, only hydrologic persistence and Pacific Ocean sea surface temperature patterns associated with ENSO and PDO provide forecasts which are statistically better than climatology. Secondly, a class of data mining techniques, known as tree-structured models, is investigated to address the nonlinear dynamics of climate teleconnections and screen promising probabilistic streamflow forecast models for river-reservoir systems. Results show that the tree-structured models can effectively capture the nonlinear features hidden in the data. Skill scores of probabilistic forecasts generated by both classification trees and logistic regression trees indicate that seasonal inflows throughout the system can be predicted with sufficient accuracy to improve water management, especially in the winter and spring seasons in central Texas. Lastly, a simplified two-stage stochastic economic-optimization model was proposed to investigate improvement in water use efficiency and the potential value of using seasonal forecasts, under the assumption of optimal decision making under uncertainty. Model results demonstrate that incorporating the probabilistic inflow forecasts into the optimization model can provide a significant improvement in seasonal water contract benefits over climatology, with lower average deficits (increased reliability) for a given average contract amount, or improved mean contract benefits for a given level of reliability compared to climatology. The results also illustrate the trade-off between the expected contract amount and reliability, i.e., larger contracts can be signed at greater risk.
Resumo:
The primary goal of this project is to demonstrate the practical use of data mining algorithms to cluster a solved steady-state computational fluids simulation (CFD) flow domain into a simplified lumped-parameter network. A commercial-quality code, “cfdMine” was created using a volume-weighted k-means clustering that that can accomplish the clustering of a 20 million cell CFD domain on a single CPU in several hours or less. Additionally agglomeration and k-means Mahalanobis were added as optional post-processing steps to further enhance the separation of the clusters. The resultant nodal network is considered a reduced-order model and can be solved transiently at a very minimal computational cost. The reduced order network is then instantiated in the commercial thermal solver MuSES to perform transient conjugate heat transfer using convection predicted using a lumped network (based on steady-state CFD). When inserting the lumped nodal network into a MuSES model, the potential for developing a “localized heat transfer coefficient” is shown to be an improvement over existing techniques. Also, it was found that the use of the clustering created a new flow visualization technique. Finally, fixing clusters near equipment newly demonstrates a capability to track temperatures near specific objects (such as equipment in vehicles).
Resumo:
The municipality of San Juan La Laguna, Guatemala is home to approximately 5,200 people and located on the western side of the Lake Atitlán caldera. Steep slopes surround all but the eastern side of San Juan. The Lake Atitlán watershed is susceptible to many natural hazards, but most predictable are the landslides that can occur annually with each rainy season, especially during high-intensity events. Hurricane Stan hit Guatemala in October 2005; the resulting flooding and landslides devastated the Atitlán region. Locations of landslide and non-landslide points were obtained from field observations and orthophotos taken following Hurricane Stan. This study used data from multiple attributes, at every landslide and non-landslide point, and applied different multivariate analyses to optimize a model for landslides prediction during high-intensity precipitation events like Hurricane Stan. The attributes considered in this study are: geology, geomorphology, distance to faults and streams, land use, slope, aspect, curvature, plan curvature, profile curvature and topographic wetness index. The attributes were pre-evaluated for their ability to predict landslides using four different attribute evaluators, all available in the open source data mining software Weka: filtered subset, information gain, gain ratio and chi-squared. Three multivariate algorithms (decision tree J48, logistic regression and BayesNet) were optimized for landslide prediction using different attributes. The following statistical parameters were used to evaluate model accuracy: precision, recall, F measure and area under the receiver operating characteristic (ROC) curve. The algorithm BayesNet yielded the most accurate model and was used to build a probability map of landslide initiation points. The probability map developed in this study was also compared to the results of a bivariate landslide susceptibility analysis conducted for the watershed, encompassing Lake Atitlán and San Juan. Landslides from Tropical Storm Agatha 2010 were used to independently validate this study’s multivariate model and the bivariate model. The ultimate aim of this study is to share the methodology and results with municipal contacts from the author's time as a U.S. Peace Corps volunteer, to facilitate more effective future landslide hazard planning and mitigation.
Resumo:
This paper is focused on the integration of state-of-the-art technologies in the fields of telecommunications, simulation algorithms, and data mining in order to develop a Type 1 diabetes patient's semi to fully-automated monitoring and management system. The main components of the system are a glucose measurement device, an insulin delivery system (insulin injection or insulin pumps), a mobile phone for the GPRS network, and a PDA or laptop for the Internet. In the medical environment, appropriate infrastructure for storage, analysis and visualizing of patients' data has been implemented to facilitate treatment design by health care experts.
Resumo:
Dieser Beitrag beschreibt die Konzeption, den Funktionsumfang und Erfahrungswerte der Open-Source-eLearning-Plattform Stud.IP. Der Funktionsumfang umfasst für jede einzelne Veranstaltung Ablaufpläne, das Hochladen von Hausarbeiten, Diskussionsforen, persönliche Homepages, Chaträume u.v.a. Ziel ist es hierbei, eine Infrastruktur des Lehrens und Lernens anzubieten, die dem Stand der Technik entspricht. Wissenschaftliche Einrichtungen finden zudem eine leistungsstarke Umgebung zur Verwaltung ihres Personals, Pflege ihrer Webseiten und der automatischer Erstellung von Veranstaltungs- oder Personallisten vor. Betreiber können auf ein verlässliches Supportsystem zugreifen, dass sie an der Weiterentwicklung durch die Entwickler- und Betreiber-Community teilhaben lässt.
Resumo:
In unserem Beitrag evaluieren wir die didaktische Einbettung einer CSCL-Anwendung anhand von Logfile-Analysen. Dazu betrachten wir exemplarisch die Nutzung des webbasierten Systems CommSy in einer projektorientierten Lehrveranstaltung, die wir als offenes Seminar charakterisieren. Wir erzielen zwei Ergebnisse: (1) Wir geben Hinweise zur Gestaltung des Nutzungskontexts eines CSCL-Systems sowie zur Unterstützung seiner anfänglichen und kontinuierlichen Nutzung. (2) Wir beschreiben die Analyse von Nutzungsanlässen und -mustern sowie von NutzerInnentypen anhand von Logfiles. Dabei können Logfile-Analysen zur Validierung weiterer Evaluationsergebnisse dienen, sind selbst jedoch nur in Kombination mit zusätzlichen Informationen zum Nutzungskontext interpretierbar.
Resumo:
We describe the use of log file analysis to investigate whether the use of CSCL applications corresponds to its didactical purposes. Exemplarily we examine the use of the web-based system CommSy as software support for project-oriented university courses. We present two findings: (1) We suggest measures to shape the context of CSCL applications and support their initial and continuous use. (2) We show how log files can be used to analyze how, when and by whom a CSCL system is used and thus help to validate further empirical findings. However, log file analyses can only be interpreted reasonably when additional data concerning the context of use is available.
Resumo:
Teaching is a dynamic activity. It can be very effective, if its impact is constantly monitored and adjusted to the demands of changing social contexts and needs of learners. This implies that teachers need to be aware about teaching and learning processes. Moreover, they should constantly question their didactical methods and the learning resources, which they provide to their students. They should reflect if their actions are suitable, and they should regulate their teaching, e.g., by updating learning materials based on new knowledge about learners, or by motivating learners to engage in further learning activities. In the last years, a rising interest in ‘learning analytics’ is observable. This interest is motivated by the availability of massive amounts of educational data. Also, the continuously increasing processing power, and a strong motivation for discovering new information from these pools of educational data, is pushing further developments within the learning analytics research field. Learning analytics could be a method for reflective teaching practice that enables and guides teachers to investigate and evaluate their work in future learning scenarios. However, this potentially positive impact has not yet been sufficiently verified by learning analytics research. Another method that pursues these goals is ‘action research’. Learning analytics promises to initiate action research processes because it facilitates awareness, reflection and regulation of teaching activities analogous to action research. Therefore, this thesis joins both concepts, in order to improve the design of learning analytics tools. Central research question of this thesis are: What are the dimensions of learning analytics in relation to action research, which need to be considered when designing a learning analytics tool? How does a learning analytics dashboard impact the teachers of technology-enhanced university lectures regarding ‘awareness’, ‘reflection’ and ‘action’? Does it initiate action research? Which are central requirements for a learning analytics tool, which pursues such effects? This project followed design-based research principles, in order to answer these research questions. The main contributions are: a theoretical reference model that connects action research and learning analytics, the conceptualization and implementation of a learning analytics tool, a requirements catalogue for useful and usable learning analytics design based on evaluations, a tested procedure for impact analysis, and guidelines for the introduction of learning analytics into higher education.
Resumo:
Well-known data mining algorithms rely on inputs in the form of pairwise similarities between objects. For large datasets it is computationally impossible to perform all pairwise comparisons. We therefore propose a novel approach that uses approximate Principal Component Analysis to efficiently identify groups of similar objects. The effectiveness of the approach is demonstrated in the context of binary classification using the supervised normalized cut as a classifier. For large datasets from the UCI repository, the approach significantly improves run times with minimal loss in accuracy.
Resumo:
Biodiversity, a multidimensional property of natural systems, is difficult to quantify partly because of the multitude of indices proposed for this purpose. Indices aim to describe general properties of communities that allow us to compare different regions, taxa, and trophic levels. Therefore, they are of fundamental importance for environmental monitoring and conservation, although there is no consensus about which indices are more appropriate and informative. We tested several common diversity indices in a range of simple to complex statistical analyses in order to determine whether some were better suited for certain analyses than others. We used data collected around the focal plant Plantago lanceolata on 60 temperate grassland plots embedded in an agricultural landscape to explore relationships between the common diversity indices of species richness (S), Shannon's diversity (H'), Simpson's diversity (D-1), Simpson's dominance (D-2), Simpson's evenness (E), and Berger-Parker dominance (BP). We calculated each of these indices for herbaceous plants, arbuscular mycorrhizal fungi, aboveground arthropods, belowground insect larvae, and P.lanceolata molecular and chemical diversity. Including these trait-based measures of diversity allowed us to test whether or not they behaved similarly to the better studied species diversity. We used path analysis to determine whether compound indices detected more relationships between diversities of different organisms and traits than more basic indices. In the path models, more paths were significant when using H', even though all models except that with E were equally reliable. This demonstrates that while common diversity indices may appear interchangeable in simple analyses, when considering complex interactions, the choice of index can profoundly alter the interpretation of results. Data mining in order to identify the index producing the most significant results should be avoided, but simultaneously considering analyses using multiple indices can provide greater insight into the interactions in a system.
Resumo:
NH···π hydrogen bonds occur frequently between the amino acid side groups in proteins and peptides. Data-mining studies of protein crystals find that ~80% of the T-shaped histidine···aromatic contacts are CH···π, and only ~20% are NH···π interactions. We investigated the infrared (IR) and ultraviolet (UV) spectra of the supersonic-jet-cooled imidazole·benzene (Im·Bz) complex as a model for the NH···π interaction between histidine and phenylalanine. Ground- and excited-state dispersion-corrected density functional calculations and correlated methods (SCS-MP2 and SCS-CC2) predict that Im·Bz has a Cs-symmetric T-shaped minimum-energy structure with an NH···π hydrogen bond to the Bz ring; the NH bond is tilted 12° away from the Bz C₆ axis. IR depletion spectra support the T-shaped geometry: The NH stretch vibrational fundamental is red shifted by −73 cm⁻¹ relative to that of bare imidazole at 3518 cm⁻¹, indicating a moderately strong NH···π interaction. While the Sₒ(A1g) → S₁(B₂u) origin of benzene at 38 086 cm⁻¹ is forbidden in the gas phase, Im·Bz exhibits a moderately intense Sₒ → S₁ origin, which appears via the D₆h → Cs symmetry lowering of Bz by its interaction with imidazole. The NH···π ground-state hydrogen bond is strong, De=22.7 kJ/mol (1899 cm⁻¹). The combination of gas-phase UV and IR spectra confirms the theoretical predictions that the optimum Im·Bz geometry is T shaped and NH···π hydrogen bonded. We find no experimental evidence for a CH···π hydrogen-bonded ground-state isomer of Im·Bz. The optimum NH···π geometry of the Im·Bz complex is very different from the majority of the histidine·aromatic contact geometries found in protein database analyses, implying that the CH···π contacts observed in these searches do not arise from favorable binding interactions but merely from protein side-chain folding and crystal-packing constraints. The UV and IR spectra of the imidazole·(benzene)₂ cluster are observed via fragmentation into the Im·Bz+ mass channel. The spectra of Im·Bz and Im·Bz₂ are cleanly separable by IR hole burning. The UV spectrum of Im·Bz₂ exhibits two 000 bands corresponding to the Sₒ → S₁ excitations of the two inequivalent benzenes, which are symmetrically shifted by −86/+88 cm⁻¹ relative to the 000 band of benzene.
Resumo:
Intensive family preservation services (IFPS), designed to stabilize at-risk families and avert out-of-home care, have been the focus of many randomized, experimental studies. Employing a retrospective “clinical data-mining” (CDM) methodology (Epstein, 2001), this study makes use of available information extracted from client records in one IFPS agency over the course of two years. The primary goal of this descriptive and associational study was to gain a clearer understanding of IFPS service delivery and effectiveness. Interventions provided to families are delineated and assessed for their impact on improved family functioning, their impact on the reduction of family violence, as well as placement prevention. Findings confirm the use of a wide range of services consistent with IFPS program theory. Because the study employs a quasi-experimental, retrospective use of available information, clinical outcomes described cannot be causally attributed to interventions employed as with randomized controlled trials. With regard to service outcomes, findings suggest that family education, empowerment services and advocacy are most influential in placement prevention and in ameliorating unmanageable behaviors in children as well as the incidence of family violence.