901 resultados para Techniques of data analysis
Resumo:
In this paper we discuss the use of digital data by the Swiss Federal Criminal Court in a recent case of attempted homicide. We use this case to examine drawbacks for the defense when the presentation of scientific evidence is partial, especially when the only perspective mentioned is that of the prosecution. We tackle this discussion at two distinct levels. First, we pursue an essentially non-technical presentation of the topic by drawing parallels between the court's summing up of the case and flawed patterns of reasoning commonly seen in other forensic disciplines, such as DNA and particle traces (e.g., gunshot residues). Then, we propose a formal analysis of the case, using elements of probability and graphical probability models, to justify our main claim that the partial presentation of digital evidence poses a risk to the administration of justice in that it keeps vital information from the defense. We will argue that such practice constitutes a violation of general principles of forensic interpretation as established by forensic science literature and current recommendations by forensic science interest groups (e.g., the European Network of Forensic Science Institutes). Finally, we posit that argument construction and analysis using formal methods can help replace digital evidence appropriately into context and thus support a sound evaluation of the evidence.
Resumo:
AbstractObjective:To evaluate the association between Hashimoto's thyroiditis (HT) and papillary thyroid carcinoma (PTC).Materials and Methods:The patients were evaluated by ultrasonography-guided fine needle aspiration cytology. Typical cytopathological aspects and/or classical histopathological findings were taken into consideration in the diagnosis of HT, and only histopathological results were considered in the diagnosis of PTC.Results:Among 1,049 patients with multi- or uninodular goiter (903 women and 146 men), 173 (16.5%) had cytopathological features of thyroiditis. Thirty-three (67.4%) out of the 49 operated patients had PTC, 9 (27.3%) of them with histopathological features of HT. Five (31.3%) out of the 16 patients with non-malignant disease also had HT. In the groups with HT, PTC, and PCT+HT, the female prevalence rate was 100%, 91.6%, and 77.8%, respectively. Mean age was 41.5, 43.3, and 48.5 years, respectively. No association was observed between the two diseases in the present study where HT occurred in 31.1% of the benign cases and in 27.3% of malignant cases (p = 0.8).Conclusion:In spite of the absence of association between HT and PCT, the possibility of malignancy in HT should always be considered because of the coexistence of the two diseases already reported in the literature.
Resumo:
This study is dedicated to search engine marketing (SEM). It aims for developing a business model of SEM firms and to provide explicit research of trustworthy practices of virtual marketing companies. Optimization is a general term that represents a variety of techniques and methods of the web pages promotion. The research addresses optimization as a business activity, and it explains its role for the online marketing. Additionally, it highlights issues of unethical techniques utilization by marketers which created relatively negative attitude to them on the Internet environment. Literature insight combines in the one place both technical and economical scientific findings in order to highlight technological and business attributes incorporated in SEM activities. Empirical data regarding search marketers was collected via e-mail questionnaires. 4 representatives of SEM companies were engaged in this study to accomplish the business model design. Additionally, the fifth respondent was a representative of the search engine portal, who provided insight on relations between search engines and marketers. Obtained information of the respondents was processed qualitatively. Movement of commercial organizations to the online market increases demand on promotional programs. SEM is the largest part of online marketing, and it is a prerogative of search engines portals. However, skilled users, or marketers, are able to implement long-term marketing programs by utilizing web page optimization techniques, key word consultancy or content optimization to increase web site visibility to search engines and, therefore, user’s attention to the customer pages. SEM firms are related to small knowledge-intensive businesses. On the basis of data analysis the business model was constructed. The SEM model includes generalized constructs, although they represent a wider amount of operational aspects. Constructing blocks of the model includes fundamental parts of SEM commercial activity: value creation, customer, infrastructure and financial segments. Also, approaches were provided on company’s differentiation and competitive advantages evaluation. It is assumed that search marketers should apply further attempts to differentiate own business out of the large number of similar service providing companies. Findings indicate that SEM companies are interested in the increasing their trustworthiness and the reputation building. Future of the search marketing is directly depending on search engines development.
Resumo:
Data is the most important asset of a company in the information age. Other assets, such as technology, facilities or products can be copied or reverse-engineered, employees can be brought over, but data remains unique to every company. As data management topics are slowly moving from unknown unknowns to known unknowns, tools to evaluate and manage data properly are developed and refined. Many projects are in progress today to develop various maturity models for evaluating information and data management practices. These maturity models come in many shapes and sizes: from short and concise ones meant for a quick assessment, to complex ones that call for an expert assessment by experienced consultants. In this paper several of them, made not only by external inter-organizational groups and authors, but also developed internally at a Major Energy Provider Company (MEPC) are juxtaposed and thoroughly analyzed. Apart from analyzing the available maturity models related to Data Management, this paper also selects the one with the most merit and describes and analyzes using it to perform a maturity assessment in MEPC. The utility of maturity models is two-fold: descriptive and prescriptive. Besides recording the current state of Data Management practices maturity by performing the assessments, this maturity model is also used to chart the way forward. Thus, after the current situation is presented, analysis and recommendations on how to improve it based on the definitions of higher levels of maturity are given. Generally, the main trend observed was the widening of the Data Management field to include more business and “soft” areas (as opposed to technical ones) and the change of focus towards business value of data, while assuming that the underlying IT systems for managing data are “ideal”, that is, left to the purely technical disciplines to design and maintain. This trend is not only present in Data Management but in other technological areas as well, where more and more attention is given to innovative use of technology, while acknowledging that the strategic importance of IT as such is diminishing.
Resumo:
This paper presents the development of a two-dimensional interactive software environment for structural analysis and optimization based on object-oriented programming using the C++ language. The main feature of the software is the effective integration of several computational tools into graphical user interfaces implemented in the Windows-98 and Windows-NT operating systems. The interfaces simplify data specification in the simulation and optimization of two-dimensional linear elastic problems. NURBS have been used in the software modules to represent geometric and graphical data. Extensions to the analysis of three-dimensional problems have been implemented and are also discussed in this paper.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Identification of low-dimensional structures and main sources of variation from multivariate data are fundamental tasks in data analysis. Many methods aimed at these tasks involve solution of an optimization problem. Thus, the objective of this thesis is to develop computationally efficient and theoretically justified methods for solving such problems. Most of the thesis is based on a statistical model, where ridges of the density estimated from the data are considered as relevant features. Finding ridges, that are generalized maxima, necessitates development of advanced optimization methods. An efficient and convergent trust region Newton method for projecting a point onto a ridge of the underlying density is developed for this purpose. The method is utilized in a differential equation-based approach for tracing ridges and computing projection coordinates along them. The density estimation is done nonparametrically by using Gaussian kernels. This allows application of ridge-based methods with only mild assumptions on the underlying structure of the data. The statistical model and the ridge finding methods are adapted to two different applications. The first one is extraction of curvilinear structures from noisy data mixed with background clutter. The second one is a novel nonlinear generalization of principal component analysis (PCA) and its extension to time series data. The methods have a wide range of potential applications, where most of the earlier approaches are inadequate. Examples include identification of faults from seismic data and identification of filaments from cosmological data. Applicability of the nonlinear PCA to climate analysis and reconstruction of periodic patterns from noisy time series data are also demonstrated. Other contributions of the thesis include development of an efficient semidefinite optimization method for embedding graphs into the Euclidean space. The method produces structure-preserving embeddings that maximize interpoint distances. It is primarily developed for dimensionality reduction, but has also potential applications in graph theory and various areas of physics, chemistry and engineering. Asymptotic behaviour of ridges and maxima of Gaussian kernel densities is also investigated when the kernel bandwidth approaches infinity. The results are applied to the nonlinear PCA and to finding significant maxima of such densities, which is a typical problem in visual object tracking.
Resumo:
Results of subgroup analysis (SA) reported in randomized clinical trials (RCT) cannot be adequately interpreted without information about the methods used in the study design and the data analysis. Our aim was to show how often inaccurate or incomplete reports occur. First, we selected eight methodological aspects of SA on the basis of their importance to a reader in determining the confidence that should be placed in the author's conclusions regarding such analysis. Then, we reviewed the current practice of reporting these methodological aspects of SA in clinical trials in four leading journals, i.e., the New England Journal of Medicine, the Journal of the American Medical Association, the Lancet, and the American Journal of Public Health. Eight consecutive reports from each journal published after July 1, 1998 were included. Of the 32 trials surveyed, 17 (53%) had at least one SA. Overall, the proportion of RCT reporting a particular methodological aspect ranged from 23 to 94%. Information on whether the SA preceded/followed the analysis was reported in only 7 (41%) of the studies. Of the total possible number of items to be reported, NEJM, JAMA, Lancet and AJPH clearly mentioned 59, 67, 58 and 72%, respectively. We conclude that current reporting of SA in RCT is incomplete and inaccurate. The results of such SA may have harmful effects on treatment recommendations if accepted without judicious scrutiny. We recommend that editors improve the reporting of SA in RCT by giving authors a list of the important items to be reported.
Resumo:
The recent rapid development of biotechnological approaches has enabled the production of large whole genome level biological data sets. In order to handle thesedata sets, reliable and efficient automated tools and methods for data processingand result interpretation are required. Bioinformatics, as the field of studying andprocessing biological data, tries to answer this need by combining methods and approaches across computer science, statistics, mathematics and engineering to studyand process biological data. The need is also increasing for tools that can be used by the biological researchers themselves who may not have a strong statistical or computational background, which requires creating tools and pipelines with intuitive user interfaces, robust analysis workflows and strong emphasis on result reportingand visualization. Within this thesis, several data analysis tools and methods have been developed for analyzing high-throughput biological data sets. These approaches, coveringseveral aspects of high-throughput data analysis, are specifically aimed for gene expression and genotyping data although in principle they are suitable for analyzing other data types as well. Coherent handling of the data across the various data analysis steps is highly important in order to ensure robust and reliable results. Thus,robust data analysis workflows are also described, putting the developed tools andmethods into a wider context. The choice of the correct analysis method may also depend on the properties of the specific data setandthereforeguidelinesforchoosing an optimal method are given. The data analysis tools, methods and workflows developed within this thesis have been applied to several research studies, of which two representative examplesare included in the thesis. The first study focuses on spermatogenesis in murinetestis and the second one examines cell lineage specification in mouse embryonicstem cells.
Resumo:
This research concerns the Urban Living Idea Contest conducted by Creator Space™ of BASF SE during its 150th anniversary in 2015. The main objectives of the thesis are to provide a comprehensive analysis of the Urban Living Idea Contest (ULIC) and propose a number of improvement suggestions for future years. More than 4,000 data points were collected and analyzed to investigate the functionality of different elements of the contest. Furthermore, a set of improvement suggestions were proposed to BASF SE. Novelty of this thesis lies in the data collection and the original analysis of the contest, which identified its critical elements, as well as the areas that could be improved. The author of this research was a member of the organizing team and involved in the decision making process from the beginning until the end of the ULIC.
Resumo:
Behavioral researchers commonly use single subject designs to evaluate the effects of a given treatment. Several different methods of data analysis are used, each with their own set of methodological strengths and limitations. Visual inspection is commonly used as a method of analyzing data which assesses the variability, level, and trend both within and between conditions (Cooper, Heron, & Heward, 2007). In an attempt to quantify treatment outcomes, researchers developed two methods for analysing data called Percentage of Non-overlapping Data Points (PND) and Percentage of Data Points Exceeding the Median (PEM). The purpose of the present study is to compare and contrast the use of Hierarchical Linear Modelling (HLM), PND and PEM in single subject research. The present study used 39 behaviours, across 17 participants to compare treatment outcomes of a group cognitive behavioural therapy program, using PND, PEM, and HLM on three response classes of Obsessive Compulsive Behaviour in children with Autism Spectrum Disorder. Findings suggest that PEM and HLM complement each other and both add invaluable information to the overall treatment results. Future research should consider using both PEM and HLM when analysing single subject designs, specifically grouped data with variability.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
We take stock of the present position of compositional data analysis, of what has been achieved in the last 20 years, and then make suggestions as to what may be sensible avenues of future research. We take an uncompromisingly applied mathematical view, that the challenge of solving practical problems should motivate our theoretical research; and that any new theory should be thoroughly investigated to see if it may provide answers to previously abandoned practical considerations. Indeed a main theme of this lecture will be to demonstrate this applied mathematical approach by a number of challenging examples
Resumo:
One of the disadvantages of old age is that there is more past than future: this, however, may be turned into an advantage if the wealth of experience and, hopefully, wisdom gained in the past can be reflected upon and throw some light on possible future trends. To an extent, then, this talk is necessarily personal, certainly nostalgic, but also self critical and inquisitive about our understanding of the discipline of statistics. A number of almost philosophical themes will run through the talk: search for appropriate modelling in relation to the real problem envisaged, emphasis on sensible balances between simplicity and complexity, the relative roles of theory and practice, the nature of communication of inferential ideas to the statistical layman, the inter-related roles of teaching, consultation and research. A list of keywords might be: identification of sample space and its mathematical structure, choices between transform and stay, the role of parametric modelling, the role of a sample space metric, the underused hypothesis lattice, the nature of compositional change, particularly in relation to the modelling of processes. While the main theme will be relevance to compositional data analysis we shall point to substantial implications for general multivariate analysis arising from experience of the development of compositional data analysis…
Resumo:
Examples of compositional data. The simplex, a suitable sample space for compositional data and Aitchison's geometry. R, a free language and environment for statistical computing and graphics