905 resultados para Web Mining, Data Mining, User Topic Model, Web User Profiles
Resumo:
The aim of this study was to group temporal profiles of 10-day composites NDVI product by similarity, which was obtained by the SPOT Vegetation sensor, for municipalities with high soybean production in the state of Paraná, Brazil, in the 2005/2006 cropping season. Data mining is a valuable tool that allows extracting knowledge from a database, identifying valid, new, potentially useful and understandable patterns. Therefore, it was used the methods for clusters generation by means of the algorithms K-Means, MAXVER and DBSCAN, implemented in the WEKA software package. Clusters were created based on the average temporal profiles of NDVI of the 277 municipalities with high soybean production in the state and the best results were found with the K-Means algorithm, grouping the municipalities into six clusters, considering the period from the beginning of October until the end of March, which is equivalent to the crop vegetative cycle. Half of the generated clusters presented spectro-temporal pattern, a characteristic of soybeans and were mostly under the soybean belt in the state of Paraná, which shows good results that were obtained with the proposed methodology as for identification of homogeneous areas. These results will be useful for the creation of regional soybean "masks" to estimate the planted area for this crop.
Resumo:
This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.
Model-View-Controller architectural pattern and its evolution in graphical user interface frameworks
Resumo:
Model-View-Controller (MVC) is an architectural pattern used in software development for graphical user interfaces. It was one of the first proposed solutions in the late 1970s to the Smart UI anti-pattern, which refers to the act of writing all domain logic into a user interface. The original MVC pattern has since evolved in multiple directions, with various names and may confuse many. The goal of this thesis is to present the origin of the MVC pattern and how it has changed over time. Software architecture in general and the MVC’s evolution within web applications are not the primary focus. Fundamen- tal designs are abstracted, and then used to examine the more recent versions. Prob- lems with the subject and its terminology are also presented.
Resumo:
Choice of industrial development options and the relevant allocation of the research funds become more and more difficult because of the increasing R&D costs and pressure for shorter development period. Forecast of the research progress is based on the analysis of the publications activity in the field of interest as well as on the dynamics of its change. Moreover, allocation of funds is hindered by exponential growth in the number of publications and patents. Thematic clusters become more and more difficult to identify, and their evolution hard to follow. The existing approaches of research field structuring and identification of its development are very limited. They do not identify the thematic clusters with adequate precision while the identified trends are often ambiguous. Therefore, there is a clear need to develop methods and tools, which are able to identify developing fields of research. The main objective of this Thesis is to develop tools and methods helping in the identification of the promising research topics in the field of separation processes. Two structuring methods as well as three approaches for identification of the development trends have been proposed. The proposed methods have been applied to the analysis of the research on distillation and filtration. The results show that the developed methods are universal and could be used to study of the various fields of research. The identified thematic clusters and the forecasted trends of their development have been confirmed in almost all tested cases. It proves the universality of the proposed methods. The results allow for identification of the fast-growing scientific fields as well as the topics characterized by stagnant or diminishing research activity.
Resumo:
A web service is a software system that provides a machine-processable interface to the other machines over the network using different Internet protocols. They are being increasingly used in the industry in order to automate different tasks and offer services to a wider audience. The REST architectural style aims at producing scalable and extensible web services using technologies that play well with the existing tools and infrastructure of the web. It provides a uniform set of operation that can be used to invoke a CRUD interface (create, retrieve, update and delete) of a web service. The stateless behavior of the service interface requires that every request to a resource is independent of the previous ones facilitating scalability. Automated systems, e.g., hotel reservation systems, provide advanced scenarios for stateful services that require a certain sequence of requests that must be followed in order to fulfill the service goals. Designing and developing such services for advanced scenarios with REST constraints require rigorous approaches that are capable of creating web services that can be trusted for their behavior. Systems that can be trusted for their behavior can be termed as dependable systems. This thesis presents an integrated design, analysis and validation approach that facilitates the service developer to create dependable and stateful REST web services. The main contribution of this thesis is that we provide a novel model-driven methodology to design behavioral REST web service interfaces and their compositions. The behavioral interfaces provide information on what methods can be invoked on a service and the pre- and post-conditions of these methods. The methodology uses Unified Modeling Language (UML), as the modeling language, which has a wide user base and has mature tools that are continuously evolving. We have used UML class diagram and UML state machine diagram with additional design constraints to provide resource and behavioral models, respectively, for designing REST web service interfaces. These service design models serve as a specification document and the information presented in them have manifold applications. The service design models also contain information about the time and domain requirements of the service that can help in requirement traceability which is an important part of our approach. Requirement traceability helps in capturing faults in the design models and other elements of software development environment by tracing back and forth the unfulfilled requirements of the service. The information about service actors is also included in the design models which is required for authenticating the service requests by authorized actors since not all types of users have access to all the resources. In addition, following our design approach, the service developer can ensure that the designed web service interfaces will be REST compliant. The second contribution of this thesis is consistency analysis of the behavioral REST interfaces. To overcome the inconsistency problem and design errors in our service models, we have used semantic technologies. The REST interfaces are represented in web ontology language, OWL2, that can be part of the semantic web. These interfaces are used with OWL 2 reasoners to check unsatisfiable concepts which result in implementations that fail. This work is fully automated thanks to the implemented translation tool and the existing OWL 2 reasoners. The third contribution of this thesis is the verification and validation of REST web services. We have used model checking techniques with UPPAAL model checker for this purpose. The timed automata of UML based service design models are generated with our transformation tool that are verified for their basic characteristics like deadlock freedom, liveness, reachability and safety. The implementation of a web service is tested using a black-box testing approach. Test cases are generated from the UPPAAL timed automata and using the online testing tool, UPPAAL TRON, the service implementation is validated at runtime against its specifications. Requirement traceability is also addressed in our validation approach with which we can see what service goals are met and trace back the unfulfilled service goals to detect the faults in the design models. A final contribution of the thesis is an implementation of behavioral REST interfaces and service monitors from the service design models. The partial code generation tool creates code skeletons of REST web services with method pre and post-conditions. The preconditions of methods constrain the user to invoke the stateful REST service under the right conditions and the post condition constraint the service developer to implement the right functionality. The details of the methods can be manually inserted by the developer as required. We do not target complete automation because we focus only on the interface aspects of the web service. The applicability of the approach is demonstrated with a pedagogical example of a hotel room booking service and a relatively complex worked example of holiday booking service taken from the industrial context. The former example presents a simple explanation of the approach and the later worked example shows how stateful and timed web services offering complex scenarios and involving other web services can be constructed using our approach.
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Technological innovations, the development of the internet, and globalization have increased the number and complexity of web applications. As a result, keeping web user interfaces understandable and usable (in terms of ease-of-use, effectiveness, and satisfaction) is a challenge. As part of this, designing userintuitive interface signs (i.e., the small elements of web user interface, e.g., navigational link, command buttons, icons, small images, thumbnails, etc.) is an issue for designers. Interface signs are key elements of web user interfaces because ‘interface signs’ act as a communication artefact to convey web content and system functionality, and because users interact with systems by means of interface signs. In the light of the above, applying semiotic (i.e., the study of signs) concepts on web interface signs will contribute to discover new and important perspectives on web user interface design and evaluation. The thesis mainly focuses on web interface signs and uses the theory of semiotic as a background theory. The underlying aim of this thesis is to provide valuable insights to design and evaluate web user interfaces from a semiotic perspective in order to improve overall web usability. The fundamental research question is formulated as What do practitioners and researchers need to be aware of from a semiotic perspective when designing or evaluating web user interfaces to improve web usability? From a methodological perspective, the thesis follows a design science research (DSR) approach. A systematic literature review and six empirical studies are carried out in this thesis. The empirical studies are carried out with a total of 74 participants in Finland. The steps of a design science research process are followed while the studies were designed and conducted; that includes (a) problem identification and motivation, (b) definition of objectives of a solution, (c) design and development, (d) demonstration, (e) evaluation, and (f) communication. The data is collected using observations in a usability testing lab, by analytical (expert) inspection, with questionnaires, and in structured and semi-structured interviews. User behaviour analysis, qualitative analysis and statistics are used to analyze the study data. The results are summarized as follows and have lead to the following contributions. Firstly, the results present the current status of semiotic research in UI design and evaluation and highlight the importance of considering semiotic concepts in UI design and evaluation. Secondly, the thesis explores interface sign ontologies (i.e., sets of concepts and skills that a user should know to interpret the meaning of interface signs) by providing a set of ontologies used to interpret the meaning of interface signs, and by providing a set of features related to ontology mapping in interpreting the meaning of interface signs. Thirdly, the thesis explores the value of integrating semiotic concepts in usability testing. Fourthly, the thesis proposes a semiotic framework (Semiotic Interface sign Design and Evaluation – SIDE) for interface sign design and evaluation in order to make them intuitive for end users and to improve web usability. The SIDE framework includes a set of determinants and attributes of user-intuitive interface signs, and a set of semiotic heuristics to design and evaluate interface signs. Finally, the thesis assesses (a) the quality of the SIDE framework in terms of performance metrics (e.g., thoroughness, validity, effectiveness, reliability, etc.) and (b) the contributions of the SIDE framework from the evaluators’ perspective.
Resumo:
Skaalautuvien web-sivujen merkitys kasvaa nykypäivänä, koska web-sivuja katsotaan hyvin erikokoisilla ja -resoluutiosilla laitteilla. Sivujen skaalautuessa eri laitteille ei tarvitse erikseen tehdä mobiilisivuja tai perinteistä natiivia ohjelmistoa joka laitteelle, vaan yksi sivu toimii kaikilla laitteilla. Ongelmana on saada web-sovellukset toimimaan eri laitteilla, koska laitteiden selaimissa saattaa olla pieni eroja, joiden vuoksi on työlästä saada skaalautuva käyttöliittymä toimimaan kaikilla eri laitteilla. Skaalautuvien sivujen kehittämisen avuksi on luotu erilaisia käyttöliittymä- ja grafiikkakirjastoja, jotka auttavat sivun skaalautuvuuden toteuttamisessa. Kirjastoja käyttämällä säästetään kehitystyöhön käytettävää aikaa ja ulkoistetaan kirjaston ylläpito kolmannelle osapuolelle. Tällöin jää enemmän aikaa varsinaisten sovelluksen kehitystyölle. Tässä työssä tutkitaan eri käyttöliittymä- ja grafiikkakirjastovaihtoehtoja käyttöliittymän toteuttamiseksi. Työssä toteutetaan yksinkertainen verkkoseurantajärjestelmän prototyyppi ja valitaan sille skaalautuva käyttöliittymä- ja grafiikkakirjasto. Järjestelmä koostuu kolmesta osasta: käyttöliittymästä, palvelusta ja tietolähteistä, joista palvelu kerää tietoa käyttöliittymälle näytettäväksi.
Resumo:
With the growth in new technologies, using online tools have become an everyday lifestyle. It has a greater impact on researchers as the data obtained from various experiments needs to be analyzed and knowledge of programming has become mandatory even for pure biologists. Hence, VTT came up with a new tool, R Executables (REX) which is a web application designed to provide a graphical interface for biological data functions like Image analysis, Gene expression data analysis, plotting, disease and control studies etc., which employs R functions to provide results. REX provides a user interactive application for the biologists to directly enter the values and run the required analysis with a single click. The program processes the given data in the background and prints results rapidly. Due to growth of data and load on server, the interface has gained problems concerning time consumption, poor GUI, data storage issues, security, minimal user interactive experience and crashes with large amount of data. This thesis handles the methods by which these problems were resolved and made REX a better application for the future. The old REX was developed using Python Django and now, a new programming language, Vaadin has been implemented. Vaadin is a Java framework for developing web applications and the programming language is extremely similar to Java with new rich components. Vaadin provides better security, better speed, good and interactive interface. In this thesis, subset functionalities of REX was selected which includes IST bulk plotting and image segmentation and implemented those using Vaadin. A code of 662 lines was programmed by me which included Vaadin as the front-end handler while R language was used for back-end data retrieval, computing and plotting. The application is optimized to allow further functionalities to be migrated with ease from old REX. Future development is focused on including Hight throughput screening functions along with gene expression database handling
Resumo:
Tässä työssä käsitellään kävijäseurannan menetelmiä ja toteutetaan niitä käytännössä. Web-analytiikkaohjelmistojen toimintaan tutustutaan, pääasiassa keskittyen Google Analyticsiin. Tavoitteena on selvittää Lappeenrannan matkailulaitepäätteiden käyttömääriä ja eriyttää niitä laitekohtaisesti. Web-analytiikasta tehdään kirjallisuuskatsaus ja kävijäseurantadataa analysoidaan sekä vertaillaan kahdesta eri verkkosivustosta. Lisäksi matkailulaitepäätteiden verkkosivuston lokeja tarkastellaan tiedonlouhinnan keinoin tarkoitusta varten kehitetyllä Python-sovelluksella. Työn pohjalta voidaan todeta, ettei matkailulaitepäätteiden käyttömääriä voida nykyisen toteutuksen perusteella eriyttää laitekohtaisesti. Istuntojen määrää ja tapahtumia voidaan kuitenkin seurata. Matkailulaitepäätteiden kävijäseurannassa tunnistetaan useita ongelmia, kuten päätteiden automaattisen verkkosivunpäivityksen tuloksia vääristävä vaikutus, osittainen Google Analytics -integraatio ja tärkeimpänä päätteen yksilöivän tunnistetiedon puuttuminen. Työssä ehdotetaan ratkaisuja, joilla mahdollistetaan kävijäseurannan tehokas käyttö ja laitekohtainen seuranta. Saadut tulokset korostavat kävijäseurannan toteutuksen suunnitelmallisuuden tärkeyttä.
Resumo:
Les sociétés modernes dépendent de plus en plus sur les systèmes informatiques et ainsi, il y a de plus en plus de pression sur les équipes de développement pour produire des logiciels de bonne qualité. Plusieurs compagnies utilisent des modèles de qualité, des suites de programmes qui analysent et évaluent la qualité d'autres programmes, mais la construction de modèles de qualité est difficile parce qu'il existe plusieurs questions qui n'ont pas été répondues dans la littérature. Nous avons étudié les pratiques de modélisation de la qualité auprès d'une grande entreprise et avons identifié les trois dimensions où une recherche additionnelle est désirable : Le support de la subjectivité de la qualité, les techniques pour faire le suivi de la qualité lors de l'évolution des logiciels, et la composition de la qualité entre différents niveaux d'abstraction. Concernant la subjectivité, nous avons proposé l'utilisation de modèles bayésiens parce qu'ils sont capables de traiter des données ambiguës. Nous avons appliqué nos modèles au problème de la détection des défauts de conception. Dans une étude de deux logiciels libres, nous avons trouvé que notre approche est supérieure aux techniques décrites dans l'état de l'art, qui sont basées sur des règles. Pour supporter l'évolution des logiciels, nous avons considéré que les scores produits par un modèle de qualité sont des signaux qui peuvent être analysés en utilisant des techniques d'exploration de données pour identifier des patrons d'évolution de la qualité. Nous avons étudié comment les défauts de conception apparaissent et disparaissent des logiciels. Un logiciel est typiquement conçu comme une hiérarchie de composants, mais les modèles de qualité ne tiennent pas compte de cette organisation. Dans la dernière partie de la dissertation, nous présentons un modèle de qualité à deux niveaux. Ces modèles ont trois parties: un modèle au niveau du composant, un modèle qui évalue l'importance de chacun des composants, et un autre qui évalue la qualité d'un composé en combinant la qualité de ses composants. L'approche a été testée sur la prédiction de classes à fort changement à partir de la qualité des méthodes. Nous avons trouvé que nos modèles à deux niveaux permettent une meilleure identification des classes à fort changement. Pour terminer, nous avons appliqué nos modèles à deux niveaux pour l'évaluation de la navigabilité des sites web à partir de la qualité des pages. Nos modèles étaient capables de distinguer entre des sites de très bonne qualité et des sites choisis aléatoirement. Au cours de la dissertation, nous présentons non seulement des problèmes théoriques et leurs solutions, mais nous avons également mené des expériences pour démontrer les avantages et les limitations de nos solutions. Nos résultats indiquent qu'on peut espérer améliorer l'état de l'art dans les trois dimensions présentées. En particulier, notre travail sur la composition de la qualité et la modélisation de l'importance est le premier à cibler ce problème. Nous croyons que nos modèles à deux niveaux sont un point de départ intéressant pour des travaux de recherche plus approfondis.
Resumo:
Le Web représente actuellement un espace privilégié d’expression et d’activité pour plusieurs communautés, où pratiques communicationnelles et pratiques documentaires s’enrichissent mutuellement. Dans sa dimension visible ou invisible, le Web constitue aussi un réservoir documentaire planétaire caractérisé non seulement par l’abondance de l’information qui y circule, mais aussi par sa diversité, sa complexité et son caractère éphémère. Les projets d’archivage du Web en cours abordent pour beaucoup cette question du point de vue de la préservation des publications en ligne sans la considérer dans une perspective archivistique. Seuls quelques projets d’archivage du Web visent la préservation du Web organisationnel ou gouvernemental. La valeur archivistique du Web, notamment du Web organisationnel, ne semble pas être reconnue malgré un effort soutenu de certaines archives nationales à diffuser des politiques d’archivage du Web organisationnel. La présente thèse a pour but de développer une meilleure compréhension de la nature des archives Web et de documenter les pratiques actuelles d’archivage du Web organisationnel. Plus précisément, cette recherche vise à répondre aux trois questions suivantes : (1) Que recommandent en général les politiques d’archivage du Web organisationnel? (2) Quelles sont les principales caractéristiques des archives Web? (3) Quelles pratiques d’archivage du Web organisationnel sont mises en place dans des organisations au Québec? Pour répondre à ces questions, cette recherche exploratoire et descriptive a adopté une approche qualitative basée sur trois modes de collecte des données, à savoir : l’analyse d’un corpus de 55 politiques et documents complémentaires relatifs à l’archivage du Web organisationnel; l’observation de 11 sites Web publics d’organismes au Québec de même que l’observation d’un échantillon de 737 documents produits par ces systèmes Web; et, enfin, des entrevues avec 21 participants impliqués dans la gestion et l’archivage de ces sites Web. Les résultats de recherche démontrent que les sites Web étudiés sont le produit de la conduite des activités en ligne d’une organisation et documentent, en même temps, les objectifs et les manifestations de sa présence sur le Web. De nouveaux types de documents propres au Web organisationnel ont pu être identifiés. Les documents qui ont migré sur le Web ont acquis un autre contexte d’usage et de nouvelles caractéristiques. Les méthodes de gestion actuelles doivent prendre en considération les propriétés des documents dans un environnement Web. Alors que certains sites d’étude n’archivent pas leur site Web public, d’autres s’y investissent. Toutefois les choix établis ne correspondent pas toujours aux recommandations proposées dans les politiques d’archivage du Web analysées et ne garantissent pas la pérennité des archives Web ni leur exploitabilité à long terme. Ce constat nous a amenée à proposer une politique type adaptée aux caractéristiques des archives Web. Ce modèle décrit les composantes essentielles d’une politique pour l’archivage des sites Web ainsi qu’un éventail des mesures que pourrait mettre en place l’organisation en fonction des résultats d’une analyse des risques associés à l’usage de son site Web public dans la conduite de ses affaires.
Resumo:
Data mining is one of the hottest research areas nowadays as it has got wide variety of applications in common man’s life to make the world a better place to live. It is all about finding interesting hidden patterns in a huge history data base. As an example, from a sales data base, one can find an interesting pattern like “people who buy magazines tend to buy news papers also” using data mining. Now in the sales point of view the advantage is that one can place these things together in the shop to increase sales. In this research work, data mining is effectively applied to a domain called placement chance prediction, since taking wise career decision is so crucial for anybody for sure. In India technical manpower analysis is carried out by an organization named National Technical Manpower Information System (NTMIS), established in 1983-84 by India's Ministry of Education & Culture. The NTMIS comprises of a lead centre in the IAMR, New Delhi, and 21 nodal centres located at different parts of the country. The Kerala State Nodal Centre is located at Cochin University of Science and Technology. In Nodal Centre, they collect placement information by sending postal questionnaire to passed out students on a regular basis. From this raw data available in the nodal centre, a history data base was prepared. Each record in this data base includes entrance rank ranges, reservation, Sector, Sex, and a particular engineering. From each such combination of attributes from the history data base of student records, corresponding placement chances is computed and stored in the history data base. From this data, various popular data mining models are built and tested. These models can be used to predict the most suitable branch for a particular new student with one of the above combination of criteria. Also a detailed performance comparison of the various data mining models is done.This research work proposes to use a combination of data mining models namely a hybrid stacking ensemble for better predictions. A strategy to predict the overall absorption rate for various branches as well as the time it takes for all the students of a particular branch to get placed etc are also proposed. Finally, this research work puts forward a new data mining algorithm namely C 4.5 * stat for numeric data sets which has been proved to have competent accuracy over standard benchmarking data sets called UCI data sets. It also proposes an optimization strategy called parameter tuning to improve the standard C 4.5 algorithm. As a summary this research work passes through all four dimensions for a typical data mining research work, namely application to a domain, development of classifier models, optimization and ensemble methods.
Resumo:
For years, choosing the right career by monitoring the trends and scope for different career paths have been a requirement for all youngsters all over the world. In this paper we provide a scientific, data mining based method for job absorption rate prediction and predicting the waiting time needed for 100% placement, for different engineering courses in India. This will help the students in India in a great deal in deciding the right discipline for them for a bright future. Information about passed out students are obtained from the NTMIS ( National technical manpower information system ) NODAL center in Kochi, India residing in Cochin University of science and technology
Resumo:
In the current study, epidemiology study is done by means of literature survey in groups identified to be at higher potential for DDIs as well as in other cases to explore patterns of DDIs and the factors affecting them. The structure of the FDA Adverse Event Reporting System (FAERS) database is studied and analyzed in detail to identify issues and challenges in data mining the drug-drug interactions. The necessary pre-processing algorithms are developed based on the analysis and the Apriori algorithm is modified to suit the process. Finally, the modules are integrated into a tool to identify DDIs. The results are compared using standard drug interaction database for validation. 31% of the associations obtained were identified to be new and the match with existing interactions was 69%. This match clearly indicates the validity of the methodology and its applicability to similar databases. Formulation of the results using the generic names expanded the relevance of the results to a global scale. The global applicability helps the health care professionals worldwide to observe caution during various stages of drug administration thus considerably enhancing pharmacovigilance