950 resultados para cross validation


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Despite many researches on development in education and psychology, not often is the methodology tested with real data. A major barrier to test the growth model is that the design of study includes repeated observations and the nature of the growth is nonlinear. The repeat measurements on a nonlinear model require sophisticated statistical methods. In this study, we present mixed effects model in a negative exponential curve to describe the development of children's reading skills. This model can describe the nature of the growth on children's reading skills and account for intra-individual and inter-individual variation. We also apply simple techniques including cross-validation, regression, and graphical methods to determine the most appropriate curve for data, to find efficient initial values of parameters, and to select potential covariates. We illustrate with an example that motivated this research: a longitudinal study of academic skills from grade 1 to grade 12 in Connecticut public schools. ^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Mistreatment and self-neglect significantly increase the risk of dying in older adults. It is estimated that 1 to 2 million older adults experience elder mistreatment and self-neglect every year in the United States. Currently, there are no elder mistreatment and self-neglect assessment tools with construct validity and measurement invariance testing and no studies have sought to identify underlying latent classes of elder self-neglect that may have differential mortality rates. Using data from 11,280 adults with Texas APS substantiated elder mistreatment and self-neglect 3 studies were conducted to: (1) test the construct validity and (2) the measurement invariance across gender and ethnicity of the Texas Adult Protective Services (APS) Client Assessment and Risk Evaluation (CARE) tool and (3) identify latent classes associated with elder self-neglect. Study 1 confirmed the construct validity of the CARE tool following adjustments to the initial hypothesized CARE tool. This resulted in the deletion of 14 assessment items and a final assessment with 5 original factors and 43 items. Cross-validation for this model was achieved. Study 2 provided empirical evidence for factor loading and item-threshold invariance of the CARE tool across gender and between African-Americans and Caucasians. The financial status domain of the CARE tool did not function properly for Hispanics and thus, had to be deleted. Subsequent analyses showed factor loading and item-threshold invariance across all 3 ethnic groups with the exception of some residual errors. Study 3 identified 4-latent classes associated with elder self-neglect behaviors which included individuals with evidence of problems in the areas of (1) their environment, (2) physical and medical status, (3) multiple domains and (4) finances. Overall, these studies provide evidence supporting the use of APS CARE tool for providing unbiased and valid investigations of mistreatment and neglect in older adults with different demographic characteristics. Furthermore, the findings support the underlying notion that elder self-neglect may not only occur along a continuum, but that differential types may exist. All of which, have very important potential implications for social and health services distributed to vulnerable mistreated and neglected older adults.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment. The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually customized treatment plans. To accomplish this objective, 64 stage NSCLC patients with similar treatments were all imaged using the same CT scanner and protocol. Quantitative image features were extracted and principal component regression with simulated annealing subset selection was used to predict shrinkage. Cross validation and permutation tests were used to validate the results. The optimal model gave a strong correlation between the observed and predicted shrinkages with . The second objective of this work was to identify sets of NSCLC CT image features that are reproducible, non-redundant, and informative across multiple machines. Feature sets with these qualities are needed for NSCLC radiomics models to be robust to machine variation and spurious correlation. To accomplish this objective, test-retest CT image pairs were obtained from 56 NSCLC patients imaged on three CT machines from two institutions. For each machine, quantitative image features with concordance correlation coefficient values greater than 0.90 were considered reproducible. Multi-machine reproducible feature sets were created by taking the intersection of individual machine reproducible feature sets. Redundant features were removed through hierarchical clustering. The findings showed that image feature reproducibility and redundancy depended on both the CT machine and the CT image type (average cine 4D-CT imaging vs. end-exhale cine 4D-CT imaging vs. helical inspiratory breath-hold 3D CT). For each image type, a set of cross-machine reproducible, non-redundant, and informative image features was identified. Compared to end-exhale 4D-CT and breath-hold 3D-CT, average 4D-CT derived image features showed superior multi-machine reproducibility and are the best candidates for clinical correlation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Cervical cancer is the leading cause of death and disease from malignant neoplasms among women in developing countries. Even though the Pap smear has significantly decreased the number of deaths from cervical cancer in the past years, it has its limitations. Researchers have developed an automated screening machine which can potentially detect abnormal cases that are overlooked by conventional screening. The goal of quantitative cytology is to classify the patient's tissue sample based on quantitative measurements of the individual cells. It is also much cheaper and potentially can take less time. One of the major challenges of collecting cells with a cytobrush is the possibility of not sampling any existing dysplastic cells on the cervix. Being able to correctly classify patients who have disease without the presence of dysplastic cells could improve the accuracy of quantitative cytology algorithms. Subtle morphologic changes in normal-appearing tissues adjacent to or distant from malignant tumors have been shown to exist, but a comparison of various statistical methods, including many recent advances in the statistical learning field, has not previously been done. The objective of this thesis is to use different classification methods applied to quantitative cytology data for the detection of malignancy associated changes (MACs). In this thesis, Elastic Net is the best algorithm. When we applied the Elastic Net algorithm to the test set, we combined the training set and validation set as "training" set and used 5-fold cross validation to choose the parameter for Elastic Net. It has a sensitivity of 47% at 80% specificity, an AUC 0.52, and a partial AUC 0.10 (95% CI 0.09-0.11).^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Accurate quantitative estimation of exposure using retrospective data has been one of the most challenging tasks in the exposure assessment field. To improve these estimates, some models have been developed using published exposure databases with their corresponding exposure determinants. These models are designed to be applied to reported exposure determinants obtained from study subjects or exposure levels assigned by an industrial hygienist, so quantitative exposure estimates can be obtained. ^ In an effort to improve the prediction accuracy and generalizability of these models, and taking into account that the limitations encountered in previous studies might be due to limitations in the applicability of traditional statistical methods and concepts, the use of computer science- derived data analysis methods, predominantly machine learning approaches, were proposed and explored in this study. ^ The goal of this study was to develop a set of models using decision trees/ensemble and neural networks methods to predict occupational outcomes based on literature-derived databases, and compare, using cross-validation and data splitting techniques, the resulting prediction capacity to that of traditional regression models. Two cases were addressed: the categorical case, where the exposure level was measured as an exposure rating following the American Industrial Hygiene Association guidelines and the continuous case, where the result of the exposure is expressed as a concentration value. Previously developed literature-based exposure databases for 1,1,1 trichloroethane, methylene dichloride and, trichloroethylene were used. ^ When compared to regression estimations, results showed better accuracy of decision trees/ensemble techniques for the categorical case while neural networks were better for estimation of continuous exposure values. Overrepresentation of classes and overfitting were the main causes for poor neural network performance and accuracy. Estimations based on literature-based databases using machine learning techniques might provide an advantage when they are applied to other methodologies that combine `expert inputs' with current exposure measurements, like the Bayesian Decision Analysis tool. The use of machine learning techniques to more accurately estimate exposures from literature-based exposure databases might represent the starting point for the independence from the expert judgment.^

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La producción de Solanum tuberosum L., Lycopersicum esculentum Mill. y Physalis ixocarpa Brot. (Solanales: Solanaceae) ha sufrido fuertes pérdidas económicas por la presencia de Bactericera cockerelli Sulc. (Hemiptera: Triozidae) al asociarse con las enfermedades punta morada o “zebra chip", además de ser el transmisor de Candidatus Liberibacter solanacearum. Las alternativas de control utilizadas han carecido de eficacia por desconocer la distribución espacial del insecto dentro de la parcela. Conocer dicho comportamiento permitiría focalizar las alternativas de control, haciéndolas más eficaces. Este trabajo tuvo por objetivo modelizar la distribución espacial de los estadíos de huevo, ninfa y adulto de B. cockerelli obtenidos en muestreos por transectos en un cultivo de papa, utilizando herramientas geoestadísticas. Los resultados indican que la distribución espacial de las poblaciones de huevos, ninfas y adultos de B. cockerelli fue de tipo agregada en cada fecha de muestreo. La validación cruzada de los semivariogramas obtenidos corrobora la distribución agregada en las poblaciones de B. cockerelli. Por su parte, los mapas elaborados permiten observar la estructura agregada de las poblaciones del insecto, permitiendo identificar áreas infestadas y áreas libres. Se encontró estabilidad espacio temporal para los tres estadios del insecto.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Molecular methods provide promising tools for routine detection and quantification of toxic microalgae in plankton samples. To this end, novel TaqMan minor groove binding probes and primers targeting the small (SSU) or large (LSU) ribosomal subunit (rRNA) were developed for two species of the marine dinoflagellate genus Alexandrium (A. minutum, A. tamutum) and for three groups/ribotypes of the A. tamarense species complex: Group I/North American (NA), Group II/Mediterranean (ME) and Group III/Western European (WE). Primers and probes for real-time quantitative PCR (qPCR) were species-specific and highly efficient when tested in qPCR assays for cross-validation with pure DNA from cultured Alexandrium strains. Suitability of the qPCR assays as molecular tools for the detection and estimation of relative cell abundances of Alexandrium species and groups was evaluated from samples of natural plankton assemblages along the Scottish east coast. The results were compared with inverted microscope cell counts (Utermöhl technique) of Alexandrium spp. and associated paralytic shellfish poisoning (PSP) toxin concentrations. The qPCR assays indicated that A. tamarense (Group I) and A. tamutum were the most abundant Alexandrium taxa and both were highly positively correlated with PSP toxin content of plankton samples. Cells of A. tamarense (Group III) were present at nearly all stations but in low abundance. Alexandrium minutum and A. tamarense (Group II) cells were not detected in any of the samples, thereby arguing for their absence from the specific North Sea region, at least at the time of the survey. The sympatric occurrence of A. tamarense Group I and Group III gives further support to the hypothesis that the groups/ribotypes of the A. tamarense species complex are cryptic species rather than variants belonging to the same species.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Secchi depth is a measure of water transparency. In the Baltic Sea region, Secchi depth maps are used to assess eutrophication and as input for habitat models. Due to their spatial and temporal coverage, satellite data would be the most suitable data source for such maps. But the Baltic Sea's optical properties are so different from the open ocean that globally calibrated standard models suffer from large errors. Regional predictive models that take the Baltic Sea's special optical properties into account are thus needed. This paper tests how accurately generalized linear models (GLMs) and generalized additive models (GAMs) with MODIS/Aqua and auxiliary data as inputs can predict Secchi depth at a regional scale. It uses cross-validation to test the prediction accuracy of hundreds of GAMs and GLMs with up to 5 input variables. A GAM with 3 input variables (chlorophyll a, remote sensing reflectance at 678 nm, and long-term mean salinity) made the most accurate predictions. Tested against field observations not used for model selection and calibration, the best model's mean absolute error (MAE) for daily predictions was 1.07 m (22%), more than 50% lower than for other publicly available Baltic Sea Secchi depth maps. The MAE for predicting monthly averages was 0.86 m (15%). Thus, the proposed model selection process was able to find a regional model with good prediction accuracy. It could be useful to find predictive models for environmental variables other than Secchi depth, using data from other satellite sensors, and for other regions where non-standard remote sensing models are needed for prediction and mapping. Annual and monthly mean Secchi depth maps for 2003-2012 come with this paper as Supplementary materials.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The morphology of ~45,000 bedforms from 13 multibeam bathymetry surveys was used as a proxy for identifying net bedload sediment transport directions and pathways throughout the San Francisco Bay estuary and adjacent outer coast. The spatially-averaged shape asymmetry of the bedforms reveals distinct pathways of ebb and flood transport. Additionally, the region-wide, ebb-oriented asymmetry of 5% suggests net seaward-directed transport within the estuarine-coastal system, with significant seaward asymmetry at the mouth of San Francisco Bay (11%), through the northern reaches of the Bay (7-8%), and among the largest bedforms (21% for lambda > 50 m). This general indication for the net transport of sand to the open coast strongly suggests that anthropogenic removal of sediment from the estuary, particularly along clearly defined seaward transport pathways, will limit the supply of sand to chronically eroding, open-coast beaches. The bedform asymmetry measurements significantly agree (up to ~ 76%) with modeled annual residual transport directions derived from a hydrodynamically-calibrated numerical model, and the orientation of adjacent, flow-sculpted seafloor features such as mega-flute structures, providing a comprehensive validation of the technique. The methods described in this paper to determine well-defined, cross-validated sediment transport pathways can be applied to estuarine-coastal systems globally where bedforms are present. The results can inform and improve regional sediment management practices to more efficiently utilize often limited sediment resources and mitigate current and future sediment supply-related impacts.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The eruption of Eyjafjallajökull volcano in 2010 lasted for 39 days, 14 April-23 May. The eruption had two explosive phases separated by a phase with lava formation and reduced explosive activity. The height of the plume was monitored every 5 min with a C-band weather radar located in Keflavík International Airport, 155 km distance from the volcano. Furthermore, several web cameras were mounted with a view of the volcano, and their images saved every five seconds. Time series of the plume-top altitude were constructed from the radar observations and images from a web camera located in the village Hvolsvöllur at 34 km distance from the volcano. This paper presents the independent radar and web camera time series and performs cross validation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present data set was used as a training set for a Habitat Suitability Model. It contains occurrence (presence-only) of living Lophelia pertusa reefs in the Irish continental margin, which were assembled from databases, cruise reports and publications. A total of 4423 records were inspected and quality assessed to ensure that they (1) represented confirmed living L. pertusa reefs (so excluding 2900 records of dead and isolated coral colony records); (2) were derived from sampling equipment that allows for accurate (<200 m) geo-referencing (so excluding 620 records derived mainly from trawling and dredging activities); and (3) were not duplicated. A total of 245 occurrences were retained for the analysis. Coral observations are highly clustered in regions targeted by research expeditions, which might lead to falsely inflated model evaluation measures (Veloz, 2009). Therefore, we coarsened the distribution data by deleting all but one record within grid cells of 0.02° resolution (Davies & Guinotte 2011). The remaining 53 points were subject to a spatial cross-validation process: a random presence point was chosen, grouped with its 12 closest neighbour presence points based on Euclidean distance and withheld from model training. This process was repeated for all records, resulting in 53 replicates of spatially non-overlapping sets of test (n=13) and training (n=40) data. The final 53 occurrence records were used for model training.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This work explores the automatic recognition of physical activity intensity patterns from multi-axial accelerometry and heart rate signals. Data collection was carried out in free-living conditions and in three controlled gymnasium circuits, for a total amount of 179.80 h of data divided into: sedentary situations (65.5%), light-to-moderate activity (17.6%) and vigorous exercise (16.9%). The proposed machine learning algorithms comprise the following steps: time-domain feature definition, standardization and PCA projection, unsupervised clustering (by k-means and GMM) and a HMM to account for long-term temporal trends. Performance was evaluated by 30 runs of a 10-fold cross-validation. Both k-means and GMM-based approaches yielded high overall accuracy (86.97% and 85.03%, respectively) and, given the imbalance of the dataset, meritorious F-measures (up to 77.88%) for non-sedentary cases. Classification errors tended to be concentrated around transients, what constrains their practical impact. Hence, we consider our proposal to be suitable for 24 h-based monitoring of physical activity in ambulatory scenarios and a first step towards intensity-specific energy expenditure estimators

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This dissertation, whose research has been conducted at the Group of Electronic and Microelectronic Design (GDEM) within the framework of the project Power Consumption Control in Multimedia Terminals (PCCMUTE), focuses on the development of an energy estimation model for the battery-powered embedded processor board. The main objectives and contributions of the work are summarized as follows: A model is proposed to obtain the accurate energy estimation results based on the linear correlation between the performance monitoring counters (PMCs) and energy consumption. the uniqueness of the appropriate PMCs for each different system, the modeling methodology is improved to obtain stable accuracies with slight variations among multiple scenarios and to be repeatable in other systems. It includes two steps: the former, the PMC-filter, to identify the most proper set among the available PMCs of a system and the latter, the k-fold cross validation method, to avoid the bias during the model training stage. The methodology is implemented on a commercial embedded board running the 2.6.34 Linux kernel and the PAPI, a cross-platform interface to configure and access PMCs. The results show that the methodology is able to keep a good stability in different scenarios and provide robust estimation results with the average relative error being less than 5%. Este trabajo fin de máster, cuya investigación se ha desarrollado en el Grupo de Diseño Electrónico y Microelectrónico (GDEM) en el marco del proyecto PccMuTe, se centra en el desarrollo de un modelo de estimación de energía para un sistema empotrado alimentado por batería. Los objetivos principales y las contribuciones de esta tesis se resumen como sigue: Se propone un modelo para obtener estimaciones precisas del consumo de energía de un sistema empotrado. El modelo se basa en la correlación lineal entre los valores de los contadores de prestaciones y el consumo de energía. Considerando la particularidad de los contadores de prestaciones en cada sistema, la metodología de modelado se ha mejorado para obtener precisiones estables, con ligeras variaciones entre escenarios múltiples y para replicar los resultados en diferentes sistemas. La metodología incluye dos etapas: la primera, filtrado-PMC, que consiste en identificar el conjunto más apropiado de contadores de prestaciones de entre los disponibles en un sistema y la segunda, el método de validación cruzada de K iteraciones, cuyo fin es evitar los sesgos durante la fase de entrenamiento. La metodología se implementa en un sistema empotrado que ejecuta el kernel 2.6.34 de Linux y PAPI, un interfaz multiplataforma para configurar y acceder a los contadores. Los resultados muestran que esta metodología consigue una buena estabilidad en diferentes escenarios y proporciona unos resultados robustos de estimación con un error medio relativo inferior al 5%.