956 resultados para automated analysis
Resumo:
Report published in the Proceedings of the National Conference on "Education and Research in the Information Society", Plovdiv, May, 2015
Resumo:
This dissertation introduces an integrated algorithm for a new application dedicated at discriminating between electrodes leading to a seizure onset and those that do not, using interictal subdural EEG data. The significance of this study is in determining among all of these channels, all containing interictal spikes, why some electrodes eventually lead to seizure while others do not. A first finding in the development process of the algorithm is that these interictal spikes had to be asynchronous and should be located in different regions of the brain, before any consequential interpretations of EEG behavioral patterns are possible. A singular merit of the proposed approach is that even when the EEG data is randomly selected (independent of the onset of seizure), we are able to classify those channels that lead to seizure from those that do not. It is also revealed that the region of ictal activity does not necessarily evolve from the tissue located at the channels that present interictal activity, as commonly believed.^ The study is also significant in terms of correlating clinical features of EEG with the patient's source of ictal activity, which is coming from a specific subset of channels that present interictal activity. The contributions of this dissertation emanate from (a) the choice made on the discriminating parameters used in the implementation, (b) the unique feature space that was used to optimize the delineation process of these two type of electrodes, (c) the development of back-propagation neural network that automated the decision making process, and (d) the establishment of mathematical functions that elicited the reasons for this delineation process. ^
Resumo:
This study explores factors related to the prompt difficulty in Automated Essay Scoring. The sample was composed of 6,924 students. For each student, there were 1-4 essays, across 20 different writing prompts, for a total of 20,243 essays. E-rater® v.2 essay scoring engine developed by the Educational Testing Service was used to score the essays. The scoring engine employs a statistical model that incorporates 10 predictors associated with writing characteristics of which 8 were used. The Rasch partial credit analysis was applied to the scores to determine the difficulty levels of prompts. In addition, the scores were used as outcomes in the series of hierarchical linear models (HLM) in which students and prompts constituted the cross-classification levels. This methodology was used to explore the partitioning of the essay score variance.^ The results indicated significant differences in prompt difficulty levels due to genre. Descriptive prompts, as a group, were found to be more difficult than the persuasive prompts. In addition, the essay score variance was partitioned between students and prompts. The amount of the essay score variance that lies between prompts was found to be relatively small (4 to 7 percent). When the essay-level, student-level-and prompt-level predictors were included in the model, it was able to explain almost all variance that lies between prompts. Since in most high-stakes writing assessments only 1-2 prompts per students are used, the essay score variance that lies between prompts represents an undesirable or "noise" variation. Identifying factors associated with this "noise" variance may prove to be important for prompt writing and for constructing Automated Essay Scoring mechanisms for weighting prompt difficulty when assigning essay score.^
Resumo:
In the wake of the “9-11” terrorists' attacks, the U.S. Government has turned to information technology (IT) to address a lack of information sharing among law enforcement agencies. This research determined if and how information-sharing technology helps law enforcement by examining the differences in perception of the value of IT between law enforcement officers who have access to automated regional information sharing and those who do not. It also examined the effect of potential intervening variables such as user characteristics, training, and experience, on the officers' evaluation of IT. The sample was limited to 588 officers from two sheriff's offices; one of them (the study group) uses information sharing technology, the other (the comparison group) does not. Triangulated methodologies included surveys, interviews, direct observation, and a review of agency records. Data analysis involved the following statistical methods: descriptive statistics, Chi-Square, factor analysis, principal component analysis, Cronbach's Alpha, Mann-Whitney tests, analysis of variance (ANOVA), and Scheffe' post hoc analysis. ^ Results indicated a significant difference between groups: the study group perceived information sharing technology as being a greater factor in solving crime and in increasing officer productivity. The study group was more satisfied with the data available to it. As to the number of arrests made, information sharing technology did not make a difference. Analysis of the potential intervening variables revealed several remarkable results. The presence of a strong performance management imperative (in the comparison sheriff's office) appeared to be a factor in case clearances and arrests, technology notwithstanding. As to the influence of user characteristics, level of education did not influence a user's satisfaction with technology, but user-satisfaction scores differed significantly among years of experience as a law enforcement officer and the amount of computer training, suggesting a significant but weak relationship. ^ Therefore, this study finds that information sharing technology assists law enforcement officers in doing their jobs. It also suggests that other variables such as computer training, experience, and management climate should be accounted for when assessing the impact of information technology. ^
Resumo:
Protecting confidential information from improper disclosure is a fundamental security goal. While encryption and access control are important tools for ensuring confidentiality, they cannot prevent an authorized system from leaking confidential information to its publicly observable outputs, whether inadvertently or maliciously. Hence, secure information flow aims to provide end-to-end control of information flow. Unfortunately, the traditionally-adopted policy of noninterference, which forbids all improper leakage, is often too restrictive. Theories of quantitative information flow address this issue by quantifying the amount of confidential information leaked by a system, with the goal of showing that it is intuitively "small" enough to be tolerated. Given such a theory, it is crucial to develop automated techniques for calculating the leakage in a system. ^ This dissertation is concerned with program analysis for calculating the maximum leakage, or capacity, of confidential information in the context of deterministic systems and under three proposed entropy measures of information leakage: Shannon entropy leakage, min-entropy leakage, and g-leakage. In this context, it turns out that calculating the maximum leakage of a program reduces to counting the number of possible outputs that it can produce. ^ The new approach introduced in this dissertation is to determine two-bit patterns, the relationships among pairs of bits in the output; for instance we might determine that two bits must be unequal. By counting the number of solutions to the two-bit patterns, we obtain an upper bound on the number of possible outputs. Hence, the maximum leakage can be bounded. We first describe a straightforward computation of the two-bit patterns using an automated prover. We then show a more efficient implementation that uses an implication graph to represent the two- bit patterns. It efficiently constructs the graph through the use of an automated prover, random executions, STP counterexamples, and deductive closure. The effectiveness of our techniques, both in terms of efficiency and accuracy, is shown through a number of case studies found in recent literature. ^
Resumo:
Safeguarding organizations against opportunism and severe deception in computer-mediated communication (CMC) presents a major challenge to CIOs and IT managers. New insights into linguistic cues of deception derive from the speech acts innate to CMC. Applying automated text analysis to archival email exchanges in a CMC system as part of a reward program, we assess the ability of word use (micro-level), message development (macro-level), and intertextual exchange cues (meta-level) to detect severe deception by business partners. We empirically assess the predictive ability of our framework using an ordinal multilevel regression model. Results indicate that deceivers minimize the use of referencing and self-deprecation but include more superfluous descriptions and flattery. Deceitful channel partners also over structure their arguments and rapidly mimic the linguistic style of the account manager across dyadic e-mail exchanges. Thanks to its diagnostic value, the proposed framework can support firms’ decision-making and guide compliance monitoring system development.
Resumo:
Modern software applications are becoming more dependent on database management systems (DBMSs). DBMSs are usually used as black boxes by software developers. For example, Object-Relational Mapping (ORM) is one of the most popular database abstraction approaches that developers use nowadays. Using ORM, objects in Object-Oriented languages are mapped to records in the database, and object manipulations are automatically translated to SQL queries. As a result of such conceptual abstraction, developers do not need deep knowledge of databases; however, all too often this abstraction leads to inefficient and incorrect database access code. Thus, this thesis proposes a series of approaches to improve the performance of database-centric software applications that are implemented using ORM. Our approaches focus on troubleshooting and detecting inefficient (i.e., performance problems) database accesses in the source code, and we rank the detected problems based on their severity. We first conduct an empirical study on the maintenance of ORM code in both open source and industrial applications. We find that ORM performance-related configurations are rarely tuned in practice, and there is a need for tools that can help improve/tune the performance of ORM-based applications. Thus, we propose approaches along two dimensions to help developers improve the performance of ORM-based applications: 1) helping developers write more performant ORM code; and 2) helping developers configure ORM configurations. To provide tooling support to developers, we first propose static analysis approaches to detect performance anti-patterns in the source code. We automatically rank the detected anti-pattern instances according to their performance impacts. Our study finds that by resolving the detected anti-patterns, the application performance can be improved by 34% on average. We then discuss our experience and lessons learned when integrating our anti-pattern detection tool into industrial practice. We hope our experience can help improve the industrial adoption of future research tools. However, as static analysis approaches are prone to false positives and lack runtime information, we also propose dynamic analysis approaches to further help developers improve the performance of their database access code. We propose automated approaches to detect redundant data access anti-patterns in the database access code, and our study finds that resolving such redundant data access anti-patterns can improve application performance by an average of 17%. Finally, we propose an automated approach to tune performance-related ORM configurations using both static and dynamic analysis. Our study shows that our approach can help improve application throughput by 27--138%. Through our case studies on real-world applications, we show that all of our proposed approaches can provide valuable support to developers and help improve application performance significantly.
Resumo:
Introduction Quantitative and accurate measurements of fat and muscle in the body are important for prevention and diagnosis of diseases related to obesity and muscle degeneration. Manually segmenting muscle and fat compartments in MR body-images is laborious and time-consuming, hindering implementation in large cohorts. In the present study, the feasibility and success-rate of a Dixon-based MR scan followed by an intensity-normalised, non-rigid, multi-atlas based segmentation was investigated in a cohort of 3,000 subjects. Materials and Methods 3,000 participants in the in-depth phenotyping arm of the UK Biobank imaging study underwent a comprehensive MR examination. All subjects were scanned using a 1.5 T MR-scanner with the dual-echo Dixon Vibe protocol, covering neck to knees. Subjects were scanned with six slabs in supine position, without localizer. Automated body composition analysis was performed using the AMRA Profiler™ system, to segment and quantify visceral adipose tissue (VAT), abdominal subcutaneous adipose tissue (ASAT) and thigh muscles. Technical quality assurance was performed and a standard set of acceptance/rejection criteria was established. Descriptive statistics were calculated for all volume measurements and quality assurance metrics. Results Of the 3,000 subjects, 2,995 (99.83%) were analysable for body fat, 2,828 (94.27%) were analysable when body fat and one thigh was included, and 2,775 (92.50%) were fully analysable for body fat and both thigh muscles. Reasons for not being able to analyse datasets were mainly due to missing slabs in the acquisition, or patient positioned so that large parts of the volume was outside of the field-of-view. Discussion and Conclusions In conclusion, this study showed that the rapid UK Biobank MR-protocol was well tolerated by most subjects and sufficiently robust to achieve very high success-rate for body composition analysis. This research has been conducted using the UK Biobank Resource.
Resumo:
This paper examines the integration of a tolerance design process within the Computer-Aided Design (CAD) environment having identified the potential to create an intelligent Digital Mock-Up [1]. The tolerancing process is complex in nature and as such reliance on Computer-Aided Tolerancing (CAT) software and domain experts can create a disconnect between the design and manufacturing disciplines It is necessary to implement the tolerance design procedure at the earliest opportunity to integrate both disciplines and to reduce workload in tolerance analysis and allocation at critical stages in product development when production is imminent.
The work seeks to develop a methodology that will allow for a preliminary tolerance allocation procedure within CAD. An approach to tolerance allocation based on sensitivity analysis is implemented on a simple assembly to review its contribution to an intelligent DMU. The procedure is developed using Python scripting for CATIA V5, with analysis results aligning with those in literature. A review of its implementation and requirements is presented.
Resumo:
Malware detection is a growing problem particularly on the Android mobile platform due to its increasing popularity and accessibility to numerous third party app markets. This has also been made worse by the increasingly sophisticated detection avoidance techniques employed by emerging malware families. This calls for more effective techniques for detection and classification of Android malware. Hence, in this paper we present an n-opcode analysis based approach that utilizes machine learning to classify and categorize Android malware. This approach enables automated feature discovery that eliminates the need for applying expert or domain knowledge to define the needed features. Our experiments on 2520 samples that were performed using up to 10-gram opcode features showed that an f-measure of 98% is achievable using this approach.
Resumo:
Safety on public transport is a major concern for the relevant authorities. We
address this issue by proposing an automated surveillance platform which combines data from video, infrared and pressure sensors. Data homogenisation and integration is achieved by a distributed architecture based on communication middleware that resolves interconnection issues, thereby enabling data modelling. A common-sense knowledge base models and encodes knowledge about public-transport platforms and the actions and activities of passengers. Trajectory data from passengers is modelled as a time-series of human activities. Common-sense knowledge and rules are then applied to detect inconsistencies or errors in the data interpretation. Lastly, the rationality that characterises human behaviour is also captured here through a bottom-up Hierarchical Task Network planner that, along with common-sense, corrects misinterpretations to explain passenger behaviour. The system is validated using a simulated bus saloon scenario as a case-study. Eighteen video sequences were recorded with up to six passengers. Four metrics were used to evaluate performance. The system, with an accuracy greater than 90% for each of the four metrics, was found to outperform a rule-base system and a system containing planning alone.
Resumo:
Companies face new challenges almost every day. In order to stay competitive, it is important that companies strive for continuous development and improvement. By describing companies through their processes it is possible to get a clear overview of the entire operation, which can contribute, to a well-established overall understanding of the company. This is a case study based on Stort AB which is a small logistics company specialized in international transportation and logistics solutions. The purpose of this study is to perform value stream mapping in order to create a more efficient production process and propose possible improvements in order to reduce processing time. After performing value stream mapping, data envelopment analysis is used to calculate how lean Stort AB is today and how lean the company can become by implementing the proposed improvements. The results show that the production process can improve efficiency by minimizing waste produced by a bad workplace layout and over-processing. The authors suggested solution is to introduce standardized processes and invest in technical instruments in order to automate the process to reduce process time. According to data envelopment analysis the business is 41 percent lean at present and may soon become 55 percent lean and finally reach an optimum 100 percent lean mode if the process is automated.
Resumo:
Carbonic anhydrases are enzymes that are ubiquitously found in all organisms that are engaged in catalyzing the hydration of carbon dioxide to form bicarbonate and proton and vice versa. They are crucial in the process of respiration, bone resorption, pH regulation, ion transport, and photosynthesis in plants. Out of the five classes of carbonic anhydrase α, β, γ, δ, ζ this study focused in the α carbonic anhydrases. This class of CAs constitute of 16 subfamilies in mammals that include 3 non-active enzymes known as Carbonic Anhydrase Related Proteins. The inactiveness of these enzymes is due to the loss of one or more Histidine residues in the active site. This thesis was conducted based on the aim of studying evolutionary analysis of carbonic anhydrase sequences from organisms spanning from the Cambrian age. It was carried out in two phases. The first phase was the sequence collection, which involved many biological sequence databases as a source. The scope of this segment included sequence alignments and analysis of the sequence manually and in an automated form incorporating few analysis tools. The second Phase was phylogenetic analysis and exploring the subcellular location of the proteins, which was key for the evolutionary analysis. Through the medium of the methods conducted with respect to the phases mentioned above, it was possible to accomplish the desired result. Certain thought-provoking sequences were come across and analyzed thoroughly. Whereas, Phylogenetics showed interesting results to bolster previous findings and new findings as well which lay bedrock for future intensified studies.
Resumo:
As a way to gain greater insights into the operation of online communities, this dissertation applies automated text mining techniques to text-based communication to identify, describe and evaluate underlying social networks among online community members. The main thrust of the study is to automate the discovery of social ties that form between community members, using only the digital footprints left behind in their online forum postings. Currently, one of the most common but time consuming methods for discovering social ties between people is to ask questions about their perceived social ties. However, such a survey is difficult to collect due to the high investment in time associated with data collection and the sensitive nature of the types of questions that may be asked. To overcome these limitations, the dissertation presents a new, content-based method for automated discovery of social networks from threaded discussions, referred to as ‘name network’. As a case study, the proposed automated method is evaluated in the context of online learning communities. The results suggest that the proposed ‘name network’ method for collecting social network data is a viable alternative to costly and time-consuming collection of users’ data using surveys. The study also demonstrates how social networks produced by the ‘name network’ method can be used to study online classes and to look for evidence of collaborative learning in online learning communities. For example, educators can use name networks as a real time diagnostic tool to identify students who might need additional help or students who may provide such help to others. Future research will evaluate the usefulness of the ‘name network’ method in other types of online communities.
Resumo:
The aim of this thesis was threefold, firstly, to compare current player tracking technology in a single game of soccer. Secondly, to investigate the running requirements of elite women’s soccer, in particular the use and application of athlete tracking devices. Finally, how can game style be quantified and defined. Study One compared four different match analysis systems commonly used in both research and applied settings: video-based time-motion analysis, a semi-automated multiple camera based system, and two commercially available Global Positioning System (GPS) based player tracking systems at 1 Hertz (Hz) and 5 Hz respectively. A comparison was made between each of the systems when recording the same game. Total distance covered during the match for the four systems ranged from 10 830 ± 770 m (semi-automated multiple camera based system) to 9 510 ± 740m (video-based time-motion analysis). At running speeds categorised as high-intensity running (>15 km⋅h-1), the semi-automated multiple camera based system reported the highest distance of 2 650 ± 530 m with video-based time-motion analysis reporting the least amount of distance covered with 1 610 ± 370 m. At speeds considered to be sprinting (>20 km⋅h-1), the video-based time-motion analysis reported the highest value (420 ± 170 m) and 1 Hz GPS units the lowest value (230 ± 160 m). These results demonstrate there are differences in the determination of the absolute distances, and that comparison of results between match analysis systems should be made with caution. Currently, there is no criterion measure for these match analysis methods and as such it was not possible to determine if one system was more accurate than another. Study Two provided an opportunity to apply player-tracking technology (GPS) to measure activity profiles and determine the physical demands of Australian international level women soccer players. In four international women’s soccer games, data was collected on a total of 15 Australian women soccer players using a 5 Hz GPS based athlete tracking device. Results indicated that Australian women soccer players covered 9 140 ± 1 030 m during 90 min of play. The total distance covered by Australian women was less than the 10 300 m reportedly covered by female soccer players in the Danish First Division. However, there was no apparent difference in the estimated "#$%&', as measured by multi-stage shuttle tests, between these studies. This study suggests that contextual information, including the “game style” of both the team and opposition may influence physical performance in games. Study Three examined the effect the level of the opposition had on the physical output of Australian women soccer players. In total, 58 game files from 5 Hz athlete-tracking devices from 13 international matches were collected. These files were analysed to examine relationships between physical demands, represented by total distance covered, high intensity running (HIR) and distances covered sprinting, and the level of the opposition, as represented by the Fédération Internationale de Football Association (FIFA) ranking at the time of the match. Higher-ranking opponents elicited less high-speed running and greater low-speed activity compared to playing teams of similar or lower ranking. The results are important to coaches and practitioners in the preparation of players for international competition, and showed that the differing physical demands required were dependent on the level of the opponents. The results also highlighted the need for continued research in the area of integrating contextual information in team sports and demonstrated that soccer can be described as having dynamic and interactive systems. The influence of playing strategy, tactics and subsequently the overall game style was highlighted as playing a significant part in the physical demands of the players. Study Four explored the concept of game style in field sports such as soccer. The aim of this study was to provide an applied framework with suggested metrics for use by coaches, media, practitioners and sports scientists. Based on the findings of Studies 1- 3 and a systematic review of the relevant literature, a theoretical framework was developed to better understand how a team’s game style could be quantified. Soccer games can be broken into key moments of play, and for each of these moments we categorised metrics that provide insight to success or otherwise, to help quantify and measure different methods of playing styles. This study highlights that to date, there had been no clear definition of game style in team sports and as such a novel definition of game style is proposed that can be used by coaches, sport scientists, performance analysts, media and general public. Studies 1-3 outline four common methods of measuring the physical demands in soccer: video based time motion analysis, GPS at 1 Hz and at 5 Hz and semiautomated multiple camera based systems. As there are no semi-automated multiple camera based systems available in Australia, primarily due to cost and logistical reasons, GPS is widely accepted for use in team sports in tracking player movements in training and competition environments. This research identified that, although there are some limitations, GPS player-tracking technology may be a valuable tool in assessing running demands in soccer players and subsequently contribute to our understanding of game style. The results of the research undertaken also reinforce the differences between methods used to analyse player movement patterns in field sports such as soccer and demonstrate that the results from different systems such as GPS based athlete tracking devices and semi-automated multiple camera based systems cannot be used interchangeably. Indeed, the magnitude of measurement differences between methods suggests that significant measurement error is evident. This was apparent even when the same technologies are used which measure at different sampling rates, such as GPS systems using either 1 Hz or 5 Hz frequencies of measurement. It was also recognised that other factors influence how team sport athletes behave within an interactive system. These factors included the strength of the opposition and their style of play. In turn, these can impact the physical demands of players that change from game to game, and even within games depending on these contextual features. Finally, the concept of what is game style and how it might be measured was examined. Game style was defined as "the characteristic playing pattern demonstrated by a team during games. It will be regularly repeated in specific situational contexts such that measurement of variables reflecting game style will be relatively stable. Variables of importance are player and ball movements, interaction of players, and will generally involve elements of speed, time and space (location)".