950 resultados para Software repository mining. Process mining. Software developer contribution


Relevância:

50.00% 50.00%

Publicador:

Resumo:

With the explosive growth of the volume and complexity of document data (e.g., news, blogs, web pages), it has become a necessity to semantically understand documents and deliver meaningful information to users. Areas dealing with these problems are crossing data mining, information retrieval, and machine learning. For example, document clustering and summarization are two fundamental techniques for understanding document data and have attracted much attention in recent years. Given a collection of documents, document clustering aims to partition them into different groups to provide efficient document browsing and navigation mechanisms. One unrevealed area in document clustering is that how to generate meaningful interpretation for the each document cluster resulted from the clustering process. Document summarization is another effective technique for document understanding, which generates a summary by selecting sentences that deliver the major or topic-relevant information in the original documents. How to improve the automatic summarization performance and apply it to newly emerging problems are two valuable research directions. To assist people to capture the semantics of documents effectively and efficiently, the dissertation focuses on developing effective data mining and machine learning algorithms and systems for (1) integrating document clustering and summarization to obtain meaningful document clusters with summarized interpretation, (2) improving document summarization performance and building document understanding systems to solve real-world applications, and (3) summarizing the differences and evolution of multiple document sources.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Due to the rapid advances in computing and sensing technologies, enormous amounts of data are being generated everyday in various applications. The integration of data mining and data visualization has been widely used to analyze these massive and complex data sets to discover hidden patterns. For both data mining and visualization to be effective, it is important to include the visualization techniques in the mining process and to generate the discovered patterns for a more comprehensive visual view. In this dissertation, four related problems: dimensionality reduction for visualizing high dimensional datasets, visualization-based clustering evaluation, interactive document mining, and multiple clusterings exploration are studied to explore the integration of data mining and data visualization. In particular, we 1) propose an efficient feature selection method (reliefF + mRMR) for preprocessing high dimensional datasets; 2) present DClusterE to integrate cluster validation with user interaction and provide rich visualization tools for users to examine document clustering results from multiple perspectives; 3) design two interactive document summarization systems to involve users efforts and generate customized summaries from 2D sentence layouts; and 4) propose a new framework which organizes the different input clusterings into a hierarchical tree structure and allows for interactive exploration of multiple clustering solutions.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Concurrent software executes multiple threads or processes to achieve high performance. However, concurrency results in a huge number of different system behaviors that are difficult to test and verify. The aim of this dissertation is to develop new methods and tools for modeling and analyzing concurrent software systems at design and code levels. This dissertation consists of several related results. First, a formal model of Mondex, an electronic purse system, is built using Petri nets from user requirements, which is formally verified using model checking. Second, Petri nets models are automatically mined from the event traces generated from scientific workflows. Third, partial order models are automatically extracted from some instrumented concurrent program execution, and potential atomicity violation bugs are automatically verified based on the partial order models using model checking. Our formal specification and verification of Mondex have contributed to the world wide effort in developing a verified software repository. Our method to mine Petri net models automatically from provenance offers a new approach to build scientific workflows. Our dynamic prediction tool, named McPatom, can predict several known bugs in real world systems including one that evades several other existing tools. McPatom is efficient and scalable as it takes advantage of the nature of atomicity violations and considers only a pair of threads and accesses to a single shared variable at one time. However, predictive tools need to consider the tradeoffs between precision and coverage. Based on McPatom, this dissertation presents two methods for improving the coverage and precision of atomicity violation predictions: 1) a post-prediction analysis method to increase coverage while ensuring precision; 2) a follow-up replaying method to further increase coverage. Both methods are implemented in a completely automatic tool.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The purpose of this research was to apply model checking by using a symbolic model checker on Predicate Transition Nets (PrT Nets). A PrT Net is a formal model of information flow which allows system properties to be modeled and analyzed. The aim of this thesis was to use the modeling and analysis power of PrT nets to provide a mechanism for the system model to be verified. Symbolic Model Verifier (SMV) was the model checker chosen in this thesis, and in order to verify the PrT net model of a system, it was translated to SMV input language. A software tool was implemented which translates the PrT Net into SMV language, hence enabling the process of model checking. The system includes two parts: the PrT net editor where the representation of a system can be edited, and the translator which converts the PrT net into an SMV program.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

In recent years, a surprising new phenomenon has emerged in which globally-distributed online communities collaborate to create useful and sophisticated computer software. These open source software groups are comprised of generally unaffiliated individuals and organizations who work in a seemingly chaotic fashion and who participate on a voluntary basis without direct financial incentive. The purpose of this research is to investigate the relationship between the social network structure of these intriguing groups and their level of output and activity, where social network structure is defined as 1) closure or connectedness within the group, 2) bridging ties which extend outside of the group, and 3) leader centrality within the group. Based on well-tested theories of social capital and centrality in teams, propositions were formulated which suggest that social network structures associated with successful open source software project communities will exhibit high levels of bridging and moderate levels of closure and leader centrality. The research setting was the SourceForge hosting organization and a study population of 143 project communities was identified. Independent variables included measures of closure and leader centrality defined over conversational ties, along with measures of bridging defined over membership ties. Dependent variables included source code commits and software releases for community output, and software downloads and project site page views for community activity. A cross-sectional study design was used and archival data were extracted and aggregated for the two-year period following the first release of project software. The resulting compiled variables were analyzed using multiple linear and quadratic regressions, controlling for group size and conversational volume. Contrary to theory-based expectations, the surprising results showed that successful project groups exhibited low levels of closure and that the levels of bridging and leader centrality were not important factors of success. These findings suggest that the creation and use of open source software may represent a fundamentally new socio-technical development process which disrupts the team paradigm and which triggers the need for building new theories of collaborative development. These new theories could point towards the broader application of open source methods for the creation of knowledge-based products other than software.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Ensuring the correctness of software has been the major motivation in software research, constituting a Grand Challenge. Due to its impact in the final implementation, one critical aspect of software is its architectural design. By guaranteeing a correct architectural design, major and costly flaws can be caught early on in the development cycle. Software architecture design has received a lot of attention in the past years, with several methods, techniques and tools developed. However, there is still more to be done, such as providing adequate formal analysis of software architectures. On these regards, a framework to ensure system dependability from design to implementation has been developed at FIU (Florida International University). This framework is based on SAM (Software Architecture Model), an ADL (Architecture Description Language), that allows hierarchical compositions of components and connectors, defines an architectural modeling language for the behavior of components and connectors, and provides a specification language for the behavioral properties. The behavioral model of a SAM model is expressed in the form of Petri nets and the properties in first order linear temporal logic. This dissertation presents a formal verification and testing approach to guarantee the correctness of Software Architectures. The Software Architectures studied are expressed in SAM. For the formal verification approach, the technique applied was model checking and the model checker of choice was Spin. As part of the approach, a SAM model is formally translated to a model in the input language of Spin and verified for its correctness with respect to temporal properties. In terms of testing, a testing approach for SAM architectures was defined which includes the evaluation of test cases based on Petri net testing theory to be used in the testing process at the design level. Additionally, the information at the design level is used to derive test cases for the implementation level. Finally, a modeling and analysis tool (SAM tool) was implemented to help support the design and analysis of SAM models. The results show the applicability of the approach to testing and verification of SAM models with the aid of the SAM tool.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Computing devices have become ubiquitous in our technologically-advanced world, serving as vehicles for software applications that provide users with a wide array of functions. Among these applications are electronic learning software, which are increasingly being used to educate and evaluate individuals ranging from grade school students to career professionals. This study will evaluate the design and implementation of user interfaces in these pieces of software. Specifically, it will explore how these interfaces can be developed to facilitate the use of electronic learning software by children. In order to do this, research will be performed in the area of human-computer interaction, focusing on cognitive psychology, user interface design, and software development. This information will be analyzed in order to design a user interface that provides an optimal user experience for children. This group will test said interface, as well as existing applications, in order to measure its usability. The objective of this study is to design a user interface that makes electronic learning software more usable for children, facilitating their learning process and increasing their academic performance. This study will be conducted by using the Adobe Creative Suite to design the user interface and an Integrated Development Environment to implement functionality. These are digital tools that are available on computing devices such as desktop computers, laptops, and smartphones, which will be used for the development of software. By using these tools, I hope to create a user interface for electronic learning software that promotes usability while maintaining functionality. This study will address the increasing complexity of computing software seen today – an issue that has risen due to the progressive implementation of new functionality. This issue is having a detrimental effect on the usability of electronic learning software, increasing the learning curve for targeted users such as children. As we make electronic learning software an integral part of educational programs in our schools, it is important to address this in order to guarantee them a successful learning experience.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Computing devices have become ubiquitous in our technologically-advanced world, serving as vehicles for software applications that provide users with a wide array of functions. Among these applications are electronic learning software, which are increasingly being used to educate and evaluate individuals ranging from grade school students to career professionals. This study will evaluate the design and implementation of user interfaces in these pieces of software. Specifically, it will explore how these interfaces can be developed to facilitate the use of electronic learning software by children. In order to do this, research will be performed in the area of human-computer interaction, focusing on cognitive psychology, user interface design, and software development. This information will be analyzed in order to design a user interface that provides an optimal user experience for children. This group will test said interface, as well as existing applications, in order to measure its usability. The objective of this study is to design a user interface that makes electronic learning software more usable for children, facilitating their learning process and increasing their academic performance. This study will be conducted by using the Adobe Creative Suite to design the user interface and an Integrated Development Environment to implement functionality. These are digital tools that are available on computing devices such as desktop computers, laptops, and smartphones, which will be used for the development of software. By using these tools, I hope to create a user interface for electronic learning software that promotes usability while maintaining functionality. This study will address the increasing complexity of computing software seen today – an issue that has risen due to the progressive implementation of new functionality. This issue is having a detrimental effect on the usability of electronic learning software, increasing the learning curve for targeted users such as children. As we make electronic learning software an integral part of educational programs in our schools, it is important to address this in order to guarantee them a successful learning experience.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The software product line engineering brings advantages when compared with the traditional software development regarding the mass customization of the system components. However, there are scenarios that to maintain separated clones of a software system seems to be an easier and more flexible approach to manage their variabilities of a software product line. This dissertation evaluates qualitatively an approach that aims to support the reconciliation of functionalities between cloned systems. The analyzed approach is based on mining data about the issues and source code of evolved cloned web systems. The next step is to process the merge conflicts collected by the approach and not indicated by traditional control version systems to identify potential integration problems from the cloned software systems. The results of the study show the feasibility of the approach to perform a systematic characterization and analysis of merge conflicts for large-scale web-based systems.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

The spread of wireless networks and growing proliferation of mobile devices require the development of mobility control mechanisms to support the different demands of traffic in different network conditions. A major obstacle to developing this kind of technology is the complexity involved in handling all the information about the large number of Moving Objects (MO), as well as the entire signaling overhead required to manage these procedures in the network. Despite several initiatives have been proposed by the scientific community to address this issue they have not proved to be effective since they depend on the particular request of the MO that is responsible for triggering the mobility process. Moreover, they are often only guided by wireless medium statistics, such as Received Signal Strength Indicator (RSSI) of the candidate Point of Attachment (PoA). Thus, this work seeks to develop, evaluate and validate a sophisticated communication infrastructure for Wireless Networking for Moving Objects (WiNeMO) systems by making use of the flexibility provided by the Software-Defined Networking (SDN) paradigm, where network functions are easily and efficiently deployed by integrating OpenFlow and IEEE 802.21 standards. For purposes of benchmarking, the analysis was conducted in the control and data planes aspects, which demonstrate that the proposal significantly outperforms typical IPbased SDN and QoS-enabled capabilities, by allowing the network to handle the multimedia traffic with optimal Quality of Service (QoS) transport and acceptable Quality of Experience (QoE) over time.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Heating, ventilation, air conditioning (HVAC) systems are significant consumers of energy, however building management systems do not typically operate them in accordance with occupant movements. Due to the delayed response of HVAC systems, prediction of occupant locations is necessary to maximize energy efficiency. We present an approach to occupant location prediction based on association rule mining, allowing prediction based on historical occupant locations. Association rule mining is a machine learning technique designed to find any correlations which exist in a given dataset. Occupant location datasets have a number of properties which differentiate them from the market basket datasets that association rule mining was originally designed for. This thesis adapts the approach to suit such datasets, focusing the rule mining process on patterns which are useful for location prediction. This approach, named OccApriori, allows for the prediction of occupants’ next locations as well as their locations further in the future, and can take into account any available data, for example the day of the week, the recent movements of the occupant, and timetable data. By integrating an existing extension of association rule mining into the approach, it is able to make predictions based on general classes of locations as well as specific locations.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

This research has explored the relationship between system test complexity and tacit knowledge. It is proposed as part of this thesis, that the process of system testing (comprising of test planning, test development, test execution, test fault analysis, test measurement, and case management), is directly affected by both complexity associated with the system under test, and also by other sources of complexity, independent of the system under test, but related to the wider process of system testing. While a certain amount of knowledge related to the system under test is inherent, tacit in nature, and therefore difficult to make explicit, it has been found that a significant amount of knowledge relating to these other sources of complexity, can indeed be made explicit. While the importance of explicit knowledge has been reinforced by this research, there has been a lack of evidence to suggest that the availability of tacit knowledge to a test team is of any less importance to the process of system testing, when operating in a traditional software development environment. The sentiment was commonly expressed by participants, that even though a considerable amount of explicit knowledge relating to the system is freely available, that a good deal of knowledge relating to the system under test, which is demanded for effective system testing, is actually tacit in nature (approximately 60% of participants operating in a traditional development environment, and 60% of participants operating in an agile development environment, expressed similar sentiments). To cater for the availability of tacit knowledge relating to the system under test, and indeed, both explicit and tacit knowledge required by system testing in general, an appropriate knowledge management structure needs to be in place. This would appear to be required, irrespective of the employed development methodology.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Online Social Network (OSN) services provided by Internet companies bring people together to chat, share the information, and enjoy the information. Meanwhile, huge amounts of data are generated by those services (they can be regarded as the social media ) every day, every hour, even every minute, and every second. Currently, researchers are interested in analyzing the OSN data, extracting interesting patterns from it, and applying those patterns to real-world applications. However, due to the large-scale property of the OSN data, it is difficult to effectively analyze it. This dissertation focuses on applying data mining and information retrieval techniques to mine two key components in the social media data — users and user-generated contents. Specifically, it aims at addressing three problems related to the social media users and contents: (1) how does one organize the users and the contents? (2) how does one summarize the textual contents so that users do not have to go over every post to capture the general idea? (3) how does one identify the influential users in the social media to benefit other applications, e.g., Marketing Campaign? The contribution of this dissertation is briefly summarized as follows. (1) It provides a comprehensive and versatile data mining framework to analyze the users and user-generated contents from the social media. (2) It designs a hierarchical co-clustering algorithm to organize the users and contents. (3) It proposes multi-document summarization methods to extract core information from the social network contents. (4) It introduces three important dimensions of social influence, and a dynamic influence model for identifying influential users.

Relevância:

50.00% 50.00%

Publicador:

Resumo:

Scientists planning to use underwater stereoscopic image technologies are often faced with numerous problems during the methodological implementations: commercial equipment is too expensive; the setup or calibration is too complex; or the imaging processing (i.e. measuring objects in the stereo-images) is too complicated to be performed without a time-consuming phase of training and evaluation. The present paper addresses some of these problems and describes a workflow for stereoscopic measurements for marine biologists. It also provides instructions on how to assemble an underwater stereo-photographic system with two digital consumer cameras and gives step-by-step guidelines for setting up the hardware. The second part details a software procedure to correct stereo-image pairs for lens distortions, which is especially important when using cameras with non-calibrated optical units. The final part presents a guide to the process of measuring the lengths (or distances) of objects in stereoscopic image pairs. To reveal the applicability and the restrictions of the described systems and to test the effects of different types of camera (a compact camera and an SLR type), experiments were performed to determine the precision and accuracy of two generic stereo-imaging units: a diver-operated system based on two Olympus Mju 1030SW compact cameras and a cable-connected observatory system based on two Canon 1100D SLR cameras. In the simplest setup without any correction for lens distortion, the low-budget Olympus Mju 1030SW system achieved mean accuracy errors (percentage deviation of a measurement from the object's real size) between 10.2 and -7.6% (overall mean value: -0.6%), depending on the size, orientation and distance of the measured object from the camera. With the single lens reflex (SLR) system, very similar values between 10.1% and -3.4% (overall mean value: -1.2%) were observed. Correction of the lens distortion significantly improved the mean accuracy errors of either system. Even more, system precision (spread of the accuracy) improved significantly in both systems. Neither the use of a wide-angle converter nor multiple reassembly of the system had a significant negative effect on the results. The study shows that underwater stereophotography, independent of the system, has a high potential for robust and non-destructive in situ sampling and can be used without prior specialist training.