969 resultados para on-disk data layout
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga et al. [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga et al. in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga et al. in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Resumo:
We agree with Duckrow and Albano [Phys. Rev. E 67, 063901 (2003)] and Quian Quiroga [Phys. Rev. E 67, 063902 (2003)] that mutual information (MI) is a useful measure of dependence for electroencephalogram (EEG) data, but we show that the improvement seen in the performance of MI on extracting dependence trends from EEG is more dependent on the type of MI estimator rather than any embedding technique used. In an independent study we conducted in search for an optimal MI estimator, and in particular for EEG applications, we examined the performance of a number of MI estimators on the data set used by Quian Quiroga in their original study, where the performance of different dependence measures on real data was investigated [Phys. Rev. E 65, 041903 (2002)]. We show that for EEG applications the best performance among the investigated estimators is achieved by k-nearest neighbors, which supports the conjecture by Quian Quiroga in Phys. Rev. E 67, 063902 (2003) that the nearest neighbor estimator is the most precise method for estimating MI.
Resumo:
Recently major processor manufacturers have announced a dramatic shift in their paradigm to increase computing power over the coming years. Instead of focusing on faster clock speeds and more powerful single core CPUs, the trend clearly goes towards multi core systems. This will also result in a paradigm shift for the development of algorithms for computationally expensive tasks, such as data mining applications. Obviously, work on parallel algorithms is not new per se but concentrated efforts in the many application domains are still missing. Multi-core systems, but also clusters of workstations and even large-scale distributed computing infrastructures provide new opportunities and pose new challenges for the design of parallel and distributed algorithms. Since data mining and machine learning systems rely on high performance computing systems, research on the corresponding algorithms must be on the forefront of parallel algorithm research in order to keep pushing data mining and machine learning applications to be more powerful and, especially for the former, interactive. To bring together researchers and practitioners working in this exciting field, a workshop on parallel data mining was organized as part of PKDD/ECML 2006 (Berlin, Germany). The six contributions selected for the program describe various aspects of data mining and machine learning approaches featuring low to high degrees of parallelism: The first contribution focuses the classic problem of distributed association rule mining and focuses on communication efficiency to improve the state of the art. After this a parallelization technique for speeding up decision tree construction by means of thread-level parallelism for shared memory systems is presented. The next paper discusses the design of a parallel approach for dis- tributed memory systems of the frequent subgraphs mining problem. This approach is based on a hierarchical communication topology to solve issues related to multi-domain computational envi- ronments. The forth paper describes the combined use and the customization of software packages to facilitate a top down parallelism in the tuning of Support Vector Machines (SVM) and the next contribution presents an interesting idea concerning parallel training of Conditional Random Fields (CRFs) and motivates their use in labeling sequential data. The last contribution finally focuses on very efficient feature selection. It describes a parallel algorithm for feature selection from random subsets. Selecting the papers included in this volume would not have been possible without the help of an international Program Committee that has provided detailed reviews for each paper. We would like to also thank Matthew Otey who helped with publicity for the workshop.
Resumo:
The Short-term Water Information and Forecasting Tools (SWIFT) is a suite of tools for flood and short-term streamflow forecasting, consisting of a collection of hydrologic model components and utilities. Catchments are modeled using conceptual subareas and a node-link structure for channel routing. The tools comprise modules for calibration, model state updating, output error correction, ensemble runs and data assimilation. Given the combinatorial nature of the modelling experiments and the sub-daily time steps typically used for simulations, the volume of model configurations and time series data is substantial and its management is not trivial. SWIFT is currently used mostly for research purposes but has also been used operationally, with intersecting but significantly different requirements. Early versions of SWIFT used mostly ad-hoc text files handled via Fortran code, with limited use of netCDF for time series data. The configuration and data handling modules have since been redesigned. The model configuration now follows a design where the data model is decoupled from the on-disk persistence mechanism. For research purposes the preferred on-disk format is JSON, to leverage numerous software libraries in a variety of languages, while retaining the legacy option of custom tab-separated text formats when it is a preferred access arrangement for the researcher. By decoupling data model and data persistence, it is much easier to interchangeably use for instance relational databases to provide stricter provenance and audit trail capabilities in an operational flood forecasting context. For the time series data, given the volume and required throughput, text based formats are usually inadequate. A schema derived from CF conventions has been designed to efficiently handle time series for SWIFT.
Resumo:
This research aimed to identify the link between the layout of workspaces in offices and the design strategies for environmental comfort. Strategies surveyed were focused on the thermal, visual and luminic comfort. In this research, visual comfort is related to issues of visual integration within and between the interior and exterior of the building. This is a case study conducted at the administrative headquarters of Centro Regional Nordeste do Instituto de Pesquisas Espaciais (INPE-CRN), located in Natal/RN. The methodological strategy used was the Post-Occupancy Evaluation, which combined the survey data on the building (layout of workspaces, bioclimatic strategies adopted in the design, use of these strategies) with some techniques aimed at acquiring qualitative information related to users. The workspace layout is primordial to satisfaction and productivity of workers. Issues such as concentration, communication, privacy, personal identity, density and space efficiency, barriers (access, visual and even ventilation and lighting), among others, are associated with the layout. The environmental comfort is one of the essential elements to maintaining life quality in workplace. Moreover, it is an important factor in user`s perception of the space in which he or she are inserted. Both layout and environmental comfort issues should be collected and analyzed in the establishment phase of the programming step. That way, it is possible to get adequate answers to these questions in subsequent project phases. It was found that changes in the program that occurred over time, especially concerning persons (number and characteristics), resulted in changes in layout, generating high density and inflexible environments. It turns difficult to adjust the furniture to the occupants` requirement, including comfort needs. However, the presence of strategies for environmental quality provides comfort to spaces, ensuring that, even in situations not considered optimal, users perceive the environment in a positive way. It was found that the relationship between environmental comfort and layout takes the following forms: in changing the perception of comfort, depending on the layout of the arrangements; adjustments in layout, due to needs for comfort; and the elevation of user satisfaction and environmental quality due to the presence of strategies comfort even in situations of inadequate layout
Resumo:
The objective of the present study was to investigate the effect of data structure on estimated genetic parameters and predicted breeding values of direct and maternal genetic effects for weaning weight (WW) and weight gain from birth to weaning (BWG), including or not the genetic covariance between direct and maternal effects. Records of 97,490 Nellore animals born between 1993 and 2006, from the Jacarezinho cattle raising farm, were used. Two different data sets were analyzed: DI_all, which included all available progenies of dams without their own performance; DII_all, which included DI_all + 20% of recorded progenies with maternal phenotypes. Two subsets were obtained from each data set (DI_all and DII_all): DI_1 and DII_1, which included only dams with three or fewer progenies; DI_5 and DII_5, which included only dams with five or more progenies. (Co)variance components and heritabilities were estimated by Bayesian inference through Gibbs sampling using univariate animal models. In general, for the population and traits studied, the proportion of dams with known phenotypic information and the number of progenies per dam influenced direct and maternal heritabilities, as well as the contribution of maternal permanent environmental variance to phenotypic variance. Only small differences were observed in the genetic and environmental parameters when the genetic covariance between direct and maternal effects was set to zero in the data sets studied. Thus, the inclusion or not of the genetic covariance between direct and maternal effects had little effect on the ranking of animals according to their breeding values for WW and BWG. Accurate estimation of genetic correlations between direct and maternal genetic effects depends on the data structure. Thus, this covariance should be set to zero in Nellore data sets in which the proportion of dams with phenotypic information is low, the number of progenies per dam is small, and pedigree relationships are poorly known. (c) 2012 Elsevier B.V. All rights reserved.
Resumo:
Background: The genome-wide identification of both morbid genes, i.e., those genes whose mutations cause hereditary human diseases, and druggable genes, i.e., genes coding for proteins whose modulation by small molecules elicits phenotypic effects, requires experimental approaches that are time-consuming and laborious. Thus, a computational approach which could accurately predict such genes on a genome-wide scale would be invaluable for accelerating the pace of discovery of causal relationships between genes and diseases as well as the determination of druggability of gene products.Results: In this paper we propose a machine learning-based computational approach to predict morbid and druggable genes on a genome-wide scale. For this purpose, we constructed a decision tree-based meta-classifier and trained it on datasets containing, for each morbid and druggable gene, network topological features, tissue expression profile and subcellular localization data as learning attributes. This meta-classifier correctly recovered 65% of known morbid genes with a precision of 66% and correctly recovered 78% of known druggable genes with a precision of 75%. It was than used to assign morbidity and druggability scores to genes not known to be morbid and druggable and we showed a good match between these scores and literature data. Finally, we generated decision trees by training the J48 algorithm on the morbidity and druggability datasets to discover cellular rules for morbidity and druggability and, among the rules, we found that the number of regulating transcription factors and plasma membrane localization are the most important factors to morbidity and druggability, respectively.Conclusions: We were able to demonstrate that network topological features along with tissue expression profile and subcellular localization can reliably predict human morbid and druggable genes on a genome-wide scale. Moreover, by constructing decision trees based on these data, we could discover cellular rules governing morbidity and druggability.
Resumo:
A CMOS/SOI circuit to decode PWM signals is presented as part of a body-implanted neurostimulator for visual prosthesis. Since encoded data is the sole input to the circuit, the decoding technique is based on a double-integration concept and does not require dc filtering. Nonoverlapping control phases are internally derived from the incoming pulses and a fast-settling comparator ensures good discrimination accuracy in the megahertz range. The circuit was integrated on a 2 mu m single-metal SOI fabrication process and has an effective area of 2mm(2) Typically, the measured resolution of encoding parameter a was better than 10% at 6MHz and V-DD=3.3V. Stand-by consumption is around 340 mu W. Pulses with frequencies up to 15MHz and alpha = 10% can be discriminated for V-DD spanning from 2.3V to 3.3V. Such an excellent immunity to V-DD deviations meets a design specification with respect to inherent coupling losses on transmitting data and power by means of a transcutaneous link.
Resumo:
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples. (C) 2012 Elsevier Ltd. All rights reserved.
Resumo:
The diffusion of radionuclides is an important safety aspect for nuclear waste disposal in argillaceous host rocks. A long-term diffusion experiment, termed DI-A, is being carried out at the Mont Terri Rock Laboratory in the Opalinus Clay formation. The aim of this experiment is the understanding of the migration and sorption behaviour of cationic and anionic species in consolidated clays. This study reports on the experimental layout and the first results obtained from the DI-A experiment, which include the investigation of HTO, Na-22(+), Cs+, and I- migration during a period of 1 year by analysing these tracers in the water circulating in the borehole. In addition, results obtained from through-diffusion experiments on small-sized samples with HTO, I-, and Cl-36(-) are presented. The decrease of tracer concentrations in the borehole is fastest for Cs+, followed by Na-22(+), HTO, and finally I-. The chemical composition of the artificial pore water in the borehole shows very little variation with time, thus indicating almost no chemical disturbance around the borehole. Through-diffusion experiments in the laboratory that were performed parallel to the bedding plane with two different methods yielded effective diffusion coefficients for HTO of 4-5 X 10(-11) m(2) s(-1) and significantly lower ones for anions Cl- and I- (0.7-1.6 X 10(-11) m(2) s(-1)). The results indicate the importance of anion exclusion effects arising from the negatively charged clay surfaces. Furthermore, they demonstrate the anisotropic diffusion properties of the clay formation with significantly increased diffusion rates parallel to bedding relative to the perpendicular direction. The tracer data of the in situ experiment were successfully described with 2D diffusion models using diffusion and sorption parameters obtained from the above mentioned and other laboratory studies. The modelling results indicate that HTO and I- diffused with no retardation. The retardation of Na+ and Cs+ could be described by empirical sorption expressions from previously derived batch sorption (Cs+) or diffusion (Na+) experiments. Overall, the obtained results demonstrate the feasibility of the technical concept to study the diffusion of nonsorbing and sorbing tracers in consolidated clays. (C) 2004 Elsevier B.V. All rights reserved.
Resumo:
OBJECTIVE To determine the success of medical management of presumptive cervical disk herniation in dogs and variables associated with treatment outcome. DESIGN Retrospective case series. ANIMALS Dogs (n=88) with presumptive cervical disk herniation. METHODS Dogs with presumptive cervical and thoracolumbar disk herniation were identified from medical records at 2 clinics and clients were mailed a questionnaire related to the success of therapy, clinical recurrence of signs, and quality of life (QOL) as interpreted by the owner. Signalment, duration and degree of neurologic dysfunction, and medication administration were determined from medical records. RESULTS Ninety-seven percent of dogs (84/87) with complete information were described as ambulatory at initial evaluation. Successful treatment was reported for 48.9% of dogs with 33% having recurrence of clinical signs and 18.1% having therapeutic failure. Bivariable logistic regression showed that non-steroidal anti-inflammatory drug (NSAID) administration was associated with success (P=.035; odds ratio [OR]=2.52). Duration of cage rest and glucocorticoid administration were not significantly associated with success or QOL. Dogs with less-severe neurologic dysfunction were more likely to have a successful outcome (OR=2.56), but this association was not significant (P=.051). CONCLUSIONS Medical management can lead to an acceptable outcome in many dogs with presumptive cervical disk herniation. Based on these data, NSAIDs should be considered as part of the therapeutic regimen. Cage rest duration and glucocorticoid administration do not appear to benefit these dogs, but this should be interpreted cautiously because of the retrospective data collection and use of client self-administered questionnaire follow-up. CLINICAL RELEVANCE These results provide insight into the success of medical management for presumptive cervical disk herniation in dogs and may allow for refinement of treatment protocols.
Resumo:
Measures have been developed to understand tendencies in the distribution of economic activity. The merits of these measures are in the convenience of data collection and processing. In this interim report, investigating the property of such measures to determine the geographical spread of economic activities, we summarize the merits and limitations of measures, and make clear that we must apply caution in their usage. As a first trial to access areal data, this project focus on administrative areas, not on point data and input-output data. Firm level data is not within the scope of this article. The rest of this article is organized as follows. In Section 2, we touch on the the limitations and problems associated with the measures and areal data. Specific measures are introduced in Section 3, and applied in Section 4. The conclusion summarizes the findings and discusses future work.
Resumo:
Spatial data are being increasingly used in a wide range of disciplines, a fact that is clearly reflected in the recent trend to add spatial dimensions to the conventional social sciences. Economics is by no means an exception. On one hand, spatial data are indispensable to many branches of economics such as economic geography, new economic geography, or spatial economics. On the other hand, macroeconomic data are becoming available at more and more micro levels, so that academics and analysts take it for granted that they are available not only for an entire country, but also for more detailed levels (e.g. state, province, and even city). The term ‘spatial economics data’ as used in this report refers to any economic data that has spatial information attached. This spatial information can be the coordinates of a location at best or a less precise place name as is used to describe administrative units. Obviously, the latter cannot be used without a map of corresponding administrative units. Maps are therefore indispensible to the analysis of spatial economic data without absolute coordinates. The aim of this report is to review the availability of spatial economic data that pertains specifically to Laos and academic studies conducted on such data up to the present. In regards to the availability of spatial economic data, efforts have been made to identify not only data that has been made available as geographic information systems (GIS) data, but also those with sufficient place labels attached. The rest of the report is organized as follows. Section 2 reviews the maps available for Laos, both in hard copy and editable electronic formats. Section 3 summarizes the spatial economic data available for Laos at the present time, and Section 4 reviews and categorizes the many economic studies utilizing these spatial data. Section 5 give examples of some of the spatial industrial data collected for this research. Section 6 provides a summary of the findings and gives some indication of the direction of the final report due for completion in fiscal 2010.
Resumo:
A “Collaborative Agreement” involving the collective participation of our students in their last year of our “Nuclear Engineering Master Degree Programme” for: “the review and capturing of selected spent fuel isotopic assay data sets to be included in the new SFCOMPO database"