901 resultados para Techniques of data analysis


Relevância:

100.00% 100.00%

Publicador:

Resumo:

This dissertation develops a new mathematical approach that overcomes the effect of a data processing phenomenon known as "histogram binning" inherent to flow cytometry data. A real-time procedure is introduced to prove the effectiveness and fast implementation of such an approach on real-world data. The histogram binning effect is a dilemma posed by two seemingly antagonistic developments: (1) flow cytometry data in its histogram form is extended in its dynamic range to improve its analysis and interpretation, and (2) the inevitable dynamic range extension introduces an unwelcome side effect, the binning effect, which skews the statistics of the data, undermining as a consequence the accuracy of the analysis and the eventual interpretation of the data. Researchers in the field contended with such a dilemma for many years, resorting either to hardware approaches that are rather costly with inherent calibration and noise effects; or have developed software techniques based on filtering the binning effect but without successfully preserving the statistical content of the original data. The mathematical approach introduced in this dissertation is so appealing that a patent application has been filed. The contribution of this dissertation is an incremental scientific innovation based on a mathematical framework that will allow researchers in the field of flow cytometry to improve the interpretation of data knowing that its statistical meaning has been faithfully preserved for its optimized analysis. Furthermore, with the same mathematical foundation, proof of the origin of such an inherent artifact is provided. These results are unique in that new mathematical derivations are established to define and solve the critical problem of the binning effect faced at the experimental assessment level, providing a data platform that preserves its statistical content. In addition, a novel method for accumulating the log-transformed data was developed. This new method uses the properties of the transformation of statistical distributions to accumulate the output histogram in a non-integer and multi-channel fashion. Although the mathematics of this new mapping technique seem intricate, the concise nature of the derivations allow for an implementation procedure that lends itself to a real-time implementation using lookup tables, a task that is also introduced in this dissertation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

A substantial amount of information on the Internet is present in the form of text. The value of this semi-structured and unstructured data has been widely acknowledged, with consequent scientific and commercial exploitation. The ever-increasing data production, however, pushes data analytic platforms to their limit. This thesis proposes techniques for more efficient textual big data analysis suitable for the Hadoop analytic platform. This research explores the direct processing of compressed textual data. The focus is on developing novel compression methods with a number of desirable properties to support text-based big data analysis in distributed environments. The novel contributions of this work include the following. Firstly, a Content-aware Partial Compression (CaPC) scheme is developed. CaPC makes a distinction between informational and functional content in which only the informational content is compressed. Thus, the compressed data is made transparent to existing software libraries which often rely on functional content to work. Secondly, a context-free bit-oriented compression scheme (Approximated Huffman Compression) based on the Huffman algorithm is developed. This uses a hybrid data structure that allows pattern searching in compressed data in linear time. Thirdly, several modern compression schemes have been extended so that the compressed data can be safely split with respect to logical data records in distributed file systems. Furthermore, an innovative two layer compression architecture is used, in which each compression layer is appropriate for the corresponding stage of data processing. Peripheral libraries are developed that seamlessly link the proposed compression schemes to existing analytic platforms and computational frameworks, and also make the use of the compressed data transparent to developers. The compression schemes have been evaluated for a number of standard MapReduce analysis tasks using a collection of real-world datasets. In comparison with existing solutions, they have shown substantial improvement in performance and significant reduction in system resource requirements.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This work outlines the theoretical advantages of multivariate methods in biomechanical data, validates the proposed methods and outlines new clinical findings relating to knee osteoarthritis that were made possible by this approach. New techniques were based on existing multivariate approaches, Partial Least Squares (PLS) and Non-negative Matrix Factorization (NMF) and validated using existing data sets. The new techniques developed, PCA-PLS-LDA (Principal Component Analysis – Partial Least Squares – Linear Discriminant Analysis), PCA-PLS-MLR (Principal Component Analysis – Partial Least Squares –Multiple Linear Regression) and Waveform Similarity (based on NMF) were developed to address the challenging characteristics of biomechanical data, variability and correlation. As a result, these new structure-seeking technique revealed new clinical findings. The first new clinical finding relates to the relationship between pain, radiographic severity and mechanics. Simultaneous analysis of pain and radiographic severity outcomes, a first in biomechanics, revealed that the knee adduction moment’s relationship to radiographic features is mediated by pain in subjects with moderate osteoarthritis. The second clinical finding was quantifying the importance of neuromuscular patterns in brace effectiveness for patients with knee osteoarthritis. I found that brace effectiveness was more related to the patient’s unbraced neuromuscular patterns than it was to mechanics, and that these neuromuscular patterns were more complicated than simply increased overall muscle activity, as previously thought.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining can be defined as the extraction of implicit, previously un-known, and potentially useful information from data. Numerous re-searchers have been developing security technology and exploring new methods to detect cyber-attacks with the DARPA 1998 dataset for Intrusion Detection and the modified versions of this dataset KDDCup99 and NSL-KDD, but until now no one have examined the performance of the Top 10 data mining algorithms selected by experts in data mining. The compared classification learning algorithms in this thesis are: C4.5, CART, k-NN and Naïve Bayes. The performance of these algorithms are compared with accuracy, error rate and average cost on modified versions of NSL-KDD train and test dataset where the instances are classified into normal and four cyber-attack categories: DoS, Probing, R2L and U2R. Additionally the most important features to detect cyber-attacks in all categories and in each category are evaluated with Weka’s Attribute Evaluator and ranked according to Information Gain. The results show that the classification algorithm with best performance on the dataset is the k-NN algorithm. The most important features to detect cyber-attacks are basic features such as the number of seconds of a network connection, the protocol used for the connection, the network service used, normal or error status of the connection and the number of data bytes sent. The most important features to detect DoS, Probing and R2L attacks are basic features and the least important features are content features. Unlike U2R attacks, where the content features are the most important features to detect attacks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cranial cruciate ligament (CCL) deficiency is the leading cause of lameness affecting the stifle joints of large breed dogs, especially Labrador Retrievers. Although CCL disease has been studied extensively, its exact pathogenesis and the primary cause leading to CCL rupture remain controversial. However, weakening secondary to repetitive microtrauma is currently believed to cause the majority of CCL instabilities diagnosed in dogs. Techniques of gait analysis have become the most productive tools to investigate normal and pathological gait in human and veterinary subjects. The inverse dynamics analysis approach models the limb as a series of connected linkages and integrates morphometric data to yield information about the net joint moment, patterns of muscle power and joint reaction forces. The results of these studies have greatly advanced our understanding of the pathogenesis of joint diseases in humans. A muscular imbalance between the hamstring and quadriceps muscles has been suggested as a cause for anterior cruciate ligament rupture in female athletes. Based on these findings, neuromuscular training programs leading to a relative risk reduction of up to 80% has been designed. In spite of the cost and morbidity associated with CCL disease and its management, very few studies have focused on the inverse dynamics gait analysis of this condition in dogs. The general goals of this research were (1) to further define gait mechanism in Labrador Retrievers with and without CCL-deficiency, (2) to identify individual dogs that are susceptible to CCL disease, and (3) to characterize their gait. The mass, location of the center of mass (COM), and mass moment of inertia of hind limb segments were calculated using a noninvasive method based on computerized tomography of normal and CCL-deficient Labrador Retrievers. Regression models were developed to determine predictive equations to estimate body segment parameters on the basis of simple morphometric measurements, providing a basis for nonterminal studies of inverse dynamics of the hind limbs in Labrador Retrievers. Kinematic, ground reaction forces (GRF) and morphometric data were combined in an inverse dynamics approach to compute hock, stifle and hip net moments, powers and joint reaction forces (JRF) while trotting in normal, CCL-deficient or sound contralateral limbs. Reductions in joint moment, power, and loads observed in CCL-deficient limbs were interpreted as modifications adopted to reduce or avoid painful mobilization of the injured stifle joint. Lameness resulting from CCL disease affected predominantly reaction forces during the braking phase and the extension during push-off. Kinetics also identified a greater joint moment and power of the contralateral limbs compared with normal, particularly of the stifle extensor muscles group, which may correlate with the lameness observed, but also with the predisposition of contralateral limbs to CCL deficiency in dogs. For the first time, surface EMG patterns of major hind limb muscles during trotting gait of healthy Labrador Retrievers were characterized and compared with kinetic and kinematic data of the stifle joint. The use of surface EMG highlighted the co-contraction patterns of the muscles around the stifle joint, which were documented during transition periods between flexion and extension of the joint, but also during the flexion observed in the weight bearing phase. Identification of possible differences in EMG activation characteristics between healthy patients and dogs with or predisposed to orthopedic and neurological disease may help understanding the neuromuscular abnormality and gait mechanics of such disorders in the future. Conformation parameters, obtained from femoral and tibial radiographs, hind limb CT images, and dual-energy X-ray absorptiometry, of hind limbs predisposed to CCL deficiency were compared with the conformation parameters from hind limbs at low risk. A combination of tibial plateau angle and femoral anteversion angle measured on radiographs was determined optimal for discriminating predisposed and non-predisposed limbs for CCL disease in Labrador Retrievers using a receiver operating characteristic curve analysis method. In the future, the tibial plateau angle (TPA) and femoral anteversion angle (FAA) may be used to screen dogs suspected of being susceptible to CCL disease. Last, kinematics and kinetics across the hock, stifle and hip joints in Labrador Retrievers presumed to be at low risk based on their radiographic TPA and FAA were compared to gait data from dogs presumed to be predisposed to CCL disease for overground and treadmill trotting gait. For overground trials, extensor moment at the hock and energy generated around the hock and stifle joints were increased in predisposed limbs compared to non predisposed limbs. For treadmill trials, dogs qualified as predisposed to CCL disease held their stifle at a greater degree of flexion, extended their hock less, and generated more energy around the stifle joints while trotting on a treadmill compared with dogs at low risk. This characterization of the gait mechanics of Labrador Retrievers at low risk or predisposed to CCL disease may help developing and monitoring preventive exercise programs to decrease gastrocnemius dominance and strengthened the hamstring muscle group.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

For a long time, electronic data analysis has been associated with quantitative methods. However, Computer Assisted Qualitative Data Analysis Software (CAQDAS) are increasingly being developed. Although the CAQDAS has been there for decades, very few qualitative health researchers report using it. This may be due to the difficulties that one has to go through to master the software and the misconceptions that are associated with using CAQDAS. While the issue of mastering CAQDAS has received ample attention, little has been done to address the misconceptions associated with CAQDAS. In this paper, the author reflects on his experience of interacting with one of the popular CAQDAS (NVivo) in order to provide evidence-based implications of using the software. The key message is that unlike statistical software, the main function of CAQDAS is not to analyse data but rather to aid the analysis process, which the researcher must always remain in control of. In other words, researchers must equally know that no software can analyse qualitative data. CAQDAS are basically data management packages, which support the researcher during analysis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Comunicação apresentada na 44th SEFI Conference, 12-­15 September 2016, Tampere, Finland

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The aim of this novel experimental study is to investigate the behaviour of a 2m x 2m model of a masonry groin vault, which is built by the assembly of blocks made of a 3D-printed plastic skin filled with mortar. The choice of the groin vault is due to the large presence of this vulnerable roofing system in the historical heritage. Experimental tests on the shaking table are carried out to explore the vault response on two support boundary conditions, involving four lateral confinement modes. The data processing of markers displacement has allowed to examine the collapse mechanisms of the vault, based on the arches deformed shapes. There then follows a numerical evaluation, to provide the orders of magnitude of the displacements associated to the previous mechanisms. Given that these displacements are related to the arches shortening and elongation, the last objective is the definition of a critical elongation between two diagonal bricks and consequently of a diagonal portion. This study aims to continue the previous work and to take another step forward in the research of ground motion effects on masonry structures.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Today’s data are increasingly complex and classical statistical techniques need growingly more refined mathematical tools to be able to model and investigate them. Paradigmatic situations are represented by data which need to be considered up to some kind of trans- formation and all those circumstances in which the analyst finds himself in the need of defining a general concept of shape. Topological Data Analysis (TDA) is a field which is fundamentally contributing to such challenges by extracting topological information from data with a plethora of interpretable and computationally accessible pipelines. We con- tribute to this field by developing a series of novel tools, techniques and applications to work with a particular topological summary called merge tree. To analyze sets of merge trees we introduce a novel metric structure along with an algorithm to compute it, define a framework to compare different functions defined on merge trees and investigate the metric space obtained with the aforementioned metric. Different geometric and topolog- ical properties of the space of merge trees are established, with the aim of obtaining a deeper understanding of such trees. To showcase the effectiveness of the proposed metric, we develop an application in the field of Functional Data Analysis, working with functions up to homeomorphic reparametrization, and in the field of radiomics, where each patient is represented via a clustering dendrogram.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

LHC experiments produce an enormous amount of data, estimated of the order of a few PetaBytes per year. Data management takes place using the Worldwide LHC Computing Grid (WLCG) grid infrastructure, both for storage and processing operations. However, in recent years, many more resources are available on High Performance Computing (HPC) farms, which generally have many computing nodes with a high number of processors. Large collaborations are working to use these resources in the most efficient way, compatibly with the constraints imposed by computing models (data distributed on the Grid, authentication, software dependencies, etc.). The aim of this thesis project is to develop a software framework that allows users to process a typical data analysis workflow of the ATLAS experiment on HPC systems. The developed analysis framework shall be deployed on the computing resources of the Open Physics Hub project and on the CINECA Marconi100 cluster, in view of the switch-on of the Leonardo supercomputer, foreseen in 2023.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

There are many natural events that can negatively affect the urban ecosystem, but weather-climate variations are certainly among the most significant. The history of settlements has been characterized by extreme events like earthquakes and floods, which repeat themselves at different times, causing extensive damage to the built heritage on a structural and urban scale. Changes in climate also alter various climatic subsystems, changing rainfall regimes and hydrological cycles, increasing the frequency and intensity of extreme precipitation events (heavy rainfall).  From an hydrological risk perspective, it is crucial to understand future events that could occur and their magnitude in order to design safer infrastructures. Unfortunately, it is not easy to understand future scenarios as the complexity of climate is enormous.  For this thesis, precipitation and discharge extremes were primarily used as data sources. It is important to underline that the two data sets are not separated: changes in rainfall regime, due to climate change, could significantly affect overflows into receiving water bodies. It is imperative that we understand and model climate change effects on water structures to support the development of adaptation strategies.   The main purpose of this thesis is to search for suitable water structures for a road located along the Tione River. Therefore, through the analysis of the area from a hydrological point of view, we aim to guarantee the safety of the infrastructure over time.   The observations made have the purpose to underline how models such as a stochastic one can improve the quality of an analysis for design purposes, and influence choices.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Aims: Surgical staple line dehiscence usually leads to severe complications. Several techniques and materials have been used to reinforce this stapling and thus reduce the related complications. The objective was to compare safety of two types of anastomotic reinforcement in open gastric bypass. Methods: A prospective, randomized study comparing an extraluminal suture, fibrin glue, and a nonpermanent buttressing material, Seamguard (R), for staple line reinforcement. Fibrin glue was excluded from the study and analysis after two leaks, requiring surgical reintervention, antibiotic therapy, and prolonged patient hospitalization. Results: Twenty patients were assigned to the suture and Seamguard reinforcement groups. The groups were similar in terms of preoperative characteristics. No staple line dehiscence occurred in the two groups, whereas two cases of dehiscence occurred in the fibrin glue group. No mortality occurred and surgical time was statistically similar for both techniques. Seamguard made the surgery more expensive. Conclusion: In our service, staple line reinforcement in open bariatric surgery with oversewing or Seamguard was considered to be safe. Seamguard application was considered to be easier than oversewing, but more expensive.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Dermatomyositis (DM) and polymyositis (PM) are rare systemic autoimmune rheumatic diseases with high fatality rates. There have been few population-based mortality studies of dermatomyositis and polymyositis in the world, and none have been conducted in Brazil. The objective of the present study was to employ multiple-cause of-death methodology in the analysis of trends in mortality related to dermatomyositis and polymyositis in the state of Sao Paulo, Brazil, between 1985 and 2007. Methods: We analyzed mortality data from the Sao Paulo State Data Analysis System, selecting all death certificates on which DM or PM was listed as a cause of death. The variables sex, age and underlying, associated or total mentions of causes of death were studied using mortality rates, proportions and historical trends. Statistical analysis were performed by chi-square and H Kruskal-Wallis tests, variance analysis and linear regression. A p value less than 0.05 was regarded as significant. Results: Over a 23-year period, there were 318 DM-related deaths and 316 PM-related deaths. Overall, DM/PM was designated as an underlying cause in 55.2% and as an associated cause in 44.8%; among 634 total deaths females accounted for 71.5%. During the study period, age-and gender-adjusted DM mortality rates did not change significantly, although PM as an underlying cause and total mentions of PM trended lower (p < 0.05). The mean ages at death were 47.76 +/- 20.81 years for DM and 54.24 +/- 17.94 years for PM (p = 0.0003). For DM/PM, respectively, as underlying causes, the principal associated causes of death were as follows: pneumonia (in 43.8%/33.5%); respiratory failure (in 34.4%/32.3%); interstitial pulmonary diseases and other pulmonary conditions (in 28.9%/17.6%); and septicemia (in 22.8%/15.9%). For DM/PM, respectively, as associated causes, the following were the principal underlying causes of death: respiratory disorders (in 28.3%/26.0%); circulatory disorders (in 17.4%/20.5%); neoplasms (in 16.7%/13.7%); infectious and parasitic diseases (in 11.6%/9.6%); and gastrointestinal disorders (in 8.0%/4.8%). Of the 318 DM-related deaths, 36 involved neoplasms, compared with 20 of the 316 PM-related deaths (p = 0.03). Conclusions: Our study using multiple cause of deaths found that DM/PM were identified as the underlying cause of death in only 55.2% of the deaths, indicating that both diseases were underestimated in the primary mortality statistics. We observed a predominance of deaths in women and in older individuals, as well as a trend toward stability in the mortality rates. We have confirmed that the risk of death is greater when either disease is accompanied by neoplasm, albeit to lesser degree in individuals with PM. The investigation of the underlying and associated causes of death related to DM/PM broaden the knowledge of the natural history of both diseases and could help integrate mortality data for use in the evaluation of control measures for DM/PM.