969 resultados para on-disk data layout


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Diabetes patients might suffer from an unhealthy life, long-term treatment and chronic complicated diseases. The decreasing hospitalization rate is a crucial problem for health care centers. This study combines the bagging method with base classifier decision tree and costs-sensitive analysis for diabetes patients' classification purpose. Real patients' data collected from a regional hospital in Thailand were analyzed. The relevance factors were selected and used to construct base classifier decision tree models to classify diabetes and non-diabetes patients. The bagging method was then applied to improve accuracy. Finally, asymmetric classification cost matrices were used to give more alternative models for diabetes data analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Irish Times article, 1 February 1999

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Observational studies demonstrate strong associations between deficient serum vitamin D (25(OH)D) levels and cardiovascular disease. To further examine the association between vitamin D and hypertension (HTN), data from the 2003-2006 National Health and Nutrition Examination Survey were analyzed to assess whether the association between vitamin D and HTN varies by sufficiency of key co-nutrients necessary for metabolic vitamin D reactions to occur. Logistic regression results demonstrate independent effect modification by calcium, magnesium, and vitamin A on the association between vitamin D and HTN. Among non-pregnant adults with adequate renal function, those with low levels of calcium, magnesium, and vitamin D levels had 1.75 times the odds of HTN compared to those with sufficient vitamin D levels (p = <0.0001). Additionally, participants with low levels of calcium, magnesium, vitamin A, and vitamin D had 5.43 times the odds of HTN compared to those with vitamin D sufficiency (p = 0.0103).

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Semantics, knowledge and Grids represent three spaces where people interact, understand, learn and create. Grids represent the advanced cyber-infrastructures and evolution. Big data influence the evolution of semantics, knowledge and Grids. Exploring semantics, knowledge and Grids on big data helps accelerate the shift of scientific paradigm, the fourth industrial revolution, and the transformational innovation of technologies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Big data are reshaping the way we interact with technology, thus fostering new applications to increase the safety-assessment of foods. An extraordinary amount of information is analysed using machine learning approaches aimed at detecting the existence or predicting the likelihood of future risks. Food business operators have to share the results of these analyses when applying to place on the market regulated products, whereas agri-food safety agencies (including the European Food Safety Authority) are exploring new avenues to increase the accuracy of their evaluations by processing Big data. Such an informational endowment brings with it opportunities and risks correlated to the extraction of meaningful inferences from data. However, conflicting interests and tensions among the involved entities - the industry, food safety agencies, and consumers - hinder the finding of shared methods to steer the processing of Big data in a sound, transparent and trustworthy way. A recent reform in the EU sectoral legislation, the lack of trust and the presence of a considerable number of stakeholders highlight the need of ethical contributions aimed at steering the development and the deployment of Big data applications. Moreover, Artificial Intelligence guidelines and charters published by European Union institutions and Member States have to be discussed in light of applied contexts, including the one at stake. This thesis aims to contribute to these goals by discussing what principles should be put forward when processing Big data in the context of agri-food safety-risk assessment. The research focuses on two interviewed topics - data ownership and data governance - by evaluating how the regulatory framework addresses the challenges raised by Big data analysis in these domains. The outcome of the project is a tentative Roadmap aimed to identify the principles to be observed when processing Big data in this domain and their possible implementations.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Artificial Intelligence (AI) and Machine Learning (ML) are novel data analysis techniques providing very accurate prediction results. They are widely adopted in a variety of industries to improve efficiency and decision-making, but they are also being used to develop intelligent systems. Their success grounds upon complex mathematical models, whose decisions and rationale are usually difficult to comprehend for human users to the point of being dubbed as black-boxes. This is particularly relevant in sensitive and highly regulated domains. To mitigate and possibly solve this issue, the Explainable AI (XAI) field became prominent in recent years. XAI consists of models and techniques to enable understanding of the intricated patterns discovered by black-box models. In this thesis, we consider model-agnostic XAI techniques, which can be applied to Tabular data, with a particular focus on the Credit Scoring domain. Special attention is dedicated to the LIME framework, for which we propose several modifications to the vanilla algorithm, in particular: a pair of complementary Stability Indices that accurately measure LIME stability, and the OptiLIME policy which helps the practitioner finding the proper balance among explanations' stability and reliability. We subsequently put forward GLEAMS a model-agnostic surrogate interpretable model which requires to be trained only once, while providing both Local and Global explanations of the black-box model. GLEAMS produces feature attributions and what-if scenarios, from both dataset and model perspective. Eventually, we argue that synthetic data are an emerging trend in AI, being more and more used to train complex models instead of original data. To be able to explain the outcomes of such models, we must guarantee that synthetic data are reliable enough to be able to translate their explanations to real-world individuals. To this end we propose DAISYnt, a suite of tests to measure synthetic tabular data quality and privacy.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

During the last semester of the Master’s Degree in Artificial Intelligence, I carried out my internship working for TXT e-Solution on the ADMITTED project. This paper describes the work done in those months. The thesis will be divided into two parts representing the two different tasks I was assigned during the course of my experience. The First part will be about the introduction of the project and the work done on the admittedly library, maintaining the code base and writing the test suits. The work carried out is more connected to the Software engineer role, developing features, fixing bugs and testing. The second part will describe the experiments done on the Anomaly detection task using a Deep Learning technique called Autoencoder, this task is on the other hand more connected to the data science role. The two tasks were not done simultaneously but were dealt with one after the other, which is why I preferred to divide them into two separate parts of this paper.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Disk drives are the bottleneck in the processing of large amounts of data used in almost all common applications. File systems attempt to reduce this by storing data sequentially on the disk drives, thereby reducing the access latencies. Although this strategy is useful when data is retrieved sequentially, the access patterns in real world workloads is not necessarily sequential and this mismatch results in storage I/O performance degradation. This thesis demonstrates that one way to improve the storage performance is to reorganize data on disk drives in the same way in which it is mostly accessed. We identify two classes of accesses: static, where access patterns do not change over the lifetime of the data and dynamic, where access patterns frequently change over short durations of time, and propose, implement and evaluate layout strategies for each of these. Our strategies are implemented in a way that they can be seamlessly integrated or removed from the system as desired. We evaluate our layout strategies for static policies using tree-structured XML data where accesses to the storage device are mostly of two kinds—parent-to-child or child-to-sibling. Our results show that for a specific class of deep-focused queries, the existing file system layout policy performs better by 5–54X. For the non-deep-focused queries, our native layout mechanism shows an improvement of 3–127X. To improve performance of the dynamic access patterns, we implement a self-optimizing storage system that performs rearranges popular block accesses on a dedicated partition based on the observed workload characteristics. Our evaluation shows an improvement of over 80% in the disk busy times over a range of workloads. These results show that applying the knowledge of data access patterns for allocation decisions can substantially improve the I/O performance.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this work, an axisymmetric two-dimensional finite element model was developed to simulate instrumented indentation testing of thin ceramic films deposited onto hard steel substrates. The level of film residual stress (sigma(r)), the film elastic modulus (E) and the film work hardening exponent (n) were varied to analyze their effects on indentation data. These numerical results were used to analyze experimental data that were obtained with titanium nitride coated specimens, in which the substrate bias applied during deposition was modified to obtain films with different levels of sigma(r). Good qualitative correlation was obtained when numerical and experimental results were compared, as long as all film properties are considered in the analyses, and not only sigma(r). The numerical analyses were also used to further understand the effect of sigma(r) on the mechanical properties calculated based on instrumented indentation data. In this case, the hardness values obtained based on real or calculated contact areas are similar only when sink-in occurs, i.e. with high n or high ratio VIE, where Y is the yield strength of the film. In an additional analysis, four ratios (R/h(max)) between indenter tip radius and maximum penetration depth were simulated to analyze the combined effects of R and sigma(r) on the indentation load-displacement curves. In this case, or did not significantly affect the load curve exponent, which was affected only by the indenter tip radius. On the other hand, the proportional curvature coefficient was significantly affected by sigma(r) and n. (C) 2010 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

One of the challenges in scientific visualization is to generate software libraries suitable for the large-scale data emerging from tera-scale simulations and instruments. We describe the efforts currently under way at SDSC and NPACI to address these challenges. The scope of the SDSC project spans data handling, graphics, visualization, and scientific application domains. Components of the research focus on the following areas: intelligent data storage, layout and handling, using an associated “Floor-Plan” (meta data); performance optimization on parallel architectures; extension of SDSC’s scalable, parallel, direct volume renderer to allow perspective viewing; and interactive rendering of fractional images (“imagelets”), which facilitates the examination of large datasets. These concepts are coordinated within a data-visualization pipeline, which operates on component data blocks sized to fit within the available computing resources. A key feature of the scheme is that the meta data, which tag the data blocks, can be propagated and applied consistently. This is possible at the disk level, in distributing the computations across parallel processors; in “imagelet” composition; and in feature tagging. The work reflects the emerging challenges and opportunities presented by the ongoing progress in high-performance computing (HPC) and the deployment of the data, computational, and visualization Grids.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Purpose: Orthodontic miniscrews are commonly used to achieve absolute anchorage during tooth movement. One of the most frequent complications is screw loss as a result of root contact. Increased precision during the process of miniscrew insertion would help prevent screw loss and potential root damage, improving treatment outcomes. Stereo lithographic surgical guides have been commonly used for prosthetic implants to increase the precision of insertion. The objective of this paper was to describe the use of a stereolithographic surgical guide suitable for one-component orthodontic miniscrews based on cone beam computed tomography (CBCT) data and to evaluate implant placement accuracy. Materials and Methods: Acrylic splints were adapted to the dental arches of four patients, and six radiopaque reference points were filled with gutta-percha. The patients were submitted to CBCT while they wore the occlusal splint. Another series of images was captured with the splint alone. After superimposition and segmentation, miniscrew insertion was simulated using planning software that allowed the user to check the implant position in all planes and in three dimensions. In a rapid-prototyping machine, a stereolithographic guide was fabricated with metallic sleeves located at the insertion points to allow for three-dimensional control of the pilot bur. The surgical guide was worn during surgery. After implant insertion, each patient was submitted to CBCT a second time to verify the implant position and the accuracy of the placement of the miniscrews. Results: The average differences between the planned and inserted positions for the ten miniscrews were 0.86 mm at the coronal end, 0.71 mm at the center, and 0.87 mm at the apical tip. The average angular discrepancy was 1.76 degrees. Conclusions: The use of stereolithographic surgical guides based on CBCT data allows for accurate orthodontic mini screw insertion without damaging neighboring anatomic structures. INT J ORAL MAXILLOFAC IMPLANTS 2011;26:860-865

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The class of Schoenberg transformations, embedding Euclidean distances into higher dimensional Euclidean spaces, is presented, and derived from theorems on positive definite and conditionally negative definite matrices. Original results on the arc lengths, angles and curvature of the transformations are proposed, and visualized on artificial data sets by classical multidimensional scaling. A distance-based discriminant algorithm and a robust multidimensional centroid estimate illustrate the theory, closely connected to the Gaussian kernels of Machine Learning.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Resolving the relationships between Metazoa and other eukaryotic groups as well as between metazoan phyla is central to the understanding of the origin and evolution of animals. The current view is based on limited data sets, either a single gene with many species (e.g., ribosomal RNA) or many genes but with only a few species. Because a reliable phylogenetic inference simultaneously requires numerous genes and numerous species, we assembled a very large data set containing 129 orthologous proteins (similar to30,000 aligned amino acid positions) for 36 eukaryotic species. Included in the alignments are data from the choanoflagellate Monosiga ovata, obtained through the sequencing of about 1,000 cDNAs. We provide conclusive support for choanoflagellates as the closest relative of animals and for fungi as the second closest. The monophyly of Plantae and chromalveolates was recovered but without strong statistical support. Within animals, in contrast to the monophyly of Coelomata observed in several recent large-scale analyses, we recovered a paraphyletic Coelamata, with nematodes and platyhelminths nested within. To include a diverse sample of organisms, data from EST projects were used for several species, resulting in a large amount of missing data in our alignment (about 25%). By using different approaches, we verify that the inferred phylogeny is not sensitive to these missing data. Therefore, this large data set provides a reliable phylogenetic framework for studying eukaryotic and animal evolution and will be easily extendable when large amounts of sequence information become available from a broader taxonomic range.