838 resultados para text and data mining
Resumo:
Presentation at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Poster at Open Repositories 2014, Helsinki, Finland, June 9-13, 2014
Resumo:
Digitalization has been predicted to change the future as a growing range of non-routine tasks will be automated, offering new kinds of business models for enterprises. Serviceoriented architecture (SOA) provides a basis for designing and implementing welldefined problems as reusable services, allowing computers to execute them. Serviceoriented design has potential to act as a mediator between IT and human resources, but enterprises struggle with their SOA adoption and lack a linkage between the benefits and costs of services. This thesis studies the phenomenon of service reuse in enterprises, proposing an ontology to link different kinds of services with their role conceptually as a part of the business model. The proposed ontology has been created on the basis of qualitative research conducted in three large enterprises. Service reuse has two roles in enterprises: it enables automated data sharing among human and IT resources, and it may provide cost savings in service development and operations. From a technical viewpoint, the ability to define a business problem as a service is one of the key enablers for achieving service reuse. The research proposes two service identification methods, first to identify prospective services in the existing documentation of the enterprise and secondly to model the services from a functional viewpoint, supporting service identification sessions with business stakeholders.
Resumo:
Advancements in information technology have made it possible for organizations to gather and store vast amounts of data of their customers. Information stored in databases can be highly valuable for organizations. However, analyzing large databases has proven to be difficult in practice. For companies in the retail industry, customer intelligence can be used to identify profitable customers, their characteristics, and behavior. By clustering customers into homogeneous groups, companies can more effectively manage their customer base and target profitable customer segments. This thesis will study the use of the self-organizing map (SOM) as a method for analyzing large customer datasets, clustering customers, and discovering information about customer behavior. Aim of the thesis is to find out whether the SOM could be a practical tool for retail companies to analyze their customer data.
Resumo:
Lasers play an important role for medical, sensoric and data storage devices. This thesis is focused on design, technology development, fabrication and characterization of hybrid ultraviolet Vertical-Cavity Surface-Emitting Lasers (UV VCSEL) with organic laser-active material and inorganic distributed Bragg reflectors (DBR). Multilayer structures with different layer thicknesses, refractive indices and absorption coefficients of the inorganic materials were studied using theoretical model calculations. During the simulations the structure parameters such as materials and thicknesses have been varied. This procedure was repeated several times during the design optimization process including also the feedback from technology and characterization. Two types of VCSEL devices were investigated. The first is an index coupled structure consisting of bottom and top DBR dielectric mirrors. In the space in between them is the cavity, which includes active region and defines the spectral gain profile. In this configuration the maximum electrical field is concentrated in the cavity and can destroy the chemical structure of the active material. The second type of laser is a so called complex coupled VCSEL. In this structure the active material is placed not only in the cavity but also in parts of the DBR structure. The simulations show that such a distribution of the active material reduces the required pumping power for reaching lasing threshold. High efficiency is achieved by substituting the dielectric material with high refractive index for the periods closer to the cavity. The inorganic materials for the DBR mirrors have been deposited by Plasma- Enhanced Chemical Vapor Deposition (PECVD) and Dual Ion Beam Sputtering (DIBS) machines. Extended optimizations of the technological processes have been performed. All the processes are carried out in a clean room Class 1 and Class 10000. The optical properties and the thicknesses of the layers are measured in-situ by spectroscopic ellipsometry and spectroscopic reflectometry. The surface roughness is analyzed by atomic force microscopy (AFM) and images of the devices are taken with scanning electron microscope (SEM). The silicon dioxide (SiO2) and silicon nitride (Si3N4) layers deposited by the PECVD machine show defects of the material structure and have higher absorption in the ultra violet range compared to ion beam deposition (IBD). This results in low reflectivity of the DBR mirrors and also reduces the optical properties of the VCSEL devices. However PECVD has the advantage that the stress in the layers can be tuned and compensated, in contrast to IBD at the moment. A sputtering machine Ionsys 1000 produced by Roth&Rau company, is used for the deposition of silicon dioxide (SiO2), silicon nitride (Si3N4), aluminum oxide (Al2O3) and zirconium dioxide (ZrO2). The chamber is equipped with main (sputter) and assisted ion sources. The dielectric materials were optimized by introducing additional oxygen and nitrogen into the chamber. DBR mirrors with different material combinations were deposited. The measured optical properties of the fabricated multilayer structures show an excellent agreement with the results of theoretical model calculations. The layers deposited by puttering show high compressive stress. As an active region a novel organic material with spiro-linked molecules is used. Two different materials have been evaporated by utilizing a dye evaporation machine in the clean room of the department Makromolekulare Chemie und Molekulare Materialien (mmCmm). The Spiro-Octopus-1 organic material has a maximum emission at the wavelength λemission = 395 nm and the Spiro-Pphenal has a maximum emission at the wavelength λemission = 418 nm. Both of them have high refractive index and can be combined with low refractive index materials like silicon dioxide (SiO2). The sputtering method shows excellent optical quality of the deposited materials and high reflection of the multilayer structures. The bottom DBR mirrors for all VCSEL devices were deposited by the DIBS machine, whereas the top DBR mirror deposited either by PECVD or by combination of PECVD and DIBS. The fabricated VCSEL structures were optically pumped by nitrogen laser at wavelength λpumping = 337 nm. The emission was measured by spectrometer. A radiation of the VCSEL structure at wavelength 392 nm and 420 nm is observed.
Resumo:
The furious pace of Moore's Law is driving computer architecture into a realm where the the speed of light is the dominant factor in system latencies. The number of clock cycles to span a chip are increasing, while the number of bits that can be accessed within a clock cycle is decreasing. Hence, it is becoming more difficult to hide latency. One alternative solution is to reduce latency by migrating threads and data, but the overhead of existing implementations has previously made migration an unserviceable solution so far. I present an architecture, implementation, and mechanisms that reduces the overhead of migration to the point where migration is a viable supplement to other latency hiding mechanisms, such as multithreading. The architecture is abstract, and presents programmers with a simple, uniform fine-grained multithreaded parallel programming model with implicit memory management. In other words, the spatial nature and implementation details (such as the number of processors) of a parallel machine are entirely hidden from the programmer. Compiler writers are encouraged to devise programming languages for the machine that guide a programmer to express their ideas in terms of objects, since objects exhibit an inherent physical locality of data and code. The machine implementation can then leverage this locality to automatically distribute data and threads across the physical machine by using a set of high performance migration mechanisms. An implementation of this architecture could migrate a null thread in 66 cycles -- over a factor of 1000 improvement over previous work. Performance also scales well; the time required to move a typical thread is only 4 to 5 times that of a null thread. Data migration performance is similar, and scales linearly with data block size. Since the performance of the migration mechanism is on par with that of an L2 cache, the implementation simulated in my work has no data caches and relies instead on multithreading and the migration mechanism to hide and reduce access latencies.
Resumo:
The COntext INterchange (COIN) strategy is an approach to solving the problem of interoperability of semantically heterogeneous data sources through context mediation. COIN has used its own notation and syntax for representing ontologies. More recently, the OWL Web Ontology Language is becoming established as the W3C recommended ontology language. We propose the use of the COIN strategy to solve context disparity and ontology interoperability problems in the emerging Semantic Web – both at the ontology level and at the data level. In conjunction with this, we propose a version of the COIN ontology model that uses OWL and the emerging rules interchange language, RuleML.
Resumo:
A 4-minute video that shows how students with dyslexia or visual stress can change the text and background colours in Adobe Acrobat Reader to suit their needs.
Resumo:
Resources from the Singapore Summer School 2014 hosted by NUS. ws-summerschool.comp.nus.edu.sg
Resumo:
Speaker: Dr Kieron O'Hara Organiser: Time: 04/02/2015 11:00-11:45 Location: B32/3077 Abstract In order to reap the potential societal benefits of big and broad data, it is essential to share and link personal data. However, privacy and data protection considerations mean that, to be shared, personal data must be anonymised, so that the data subject cannot be identified from the data. Anonymisation is therefore a vital tool for data sharing, but deanonymisation, or reidentification, is always possible given sufficient auxiliary information (and as the amount of data grows, both in terms of creation, and in terms of availability in the public domain, the probability of finding such auxiliary information grows). This creates issues for the management of anonymisation, which are exacerbated not only by uncertainties about the future, but also by misunderstandings about the process(es) of anonymisation. This talk discusses these issues in relation to privacy, risk management and security, reports on recent theoretical tools created by the UKAN network of statistics professionals (on which the author is one of the leads), and asks how long anonymisation can remain a useful tool, and what might replace it.
Resumo:
Abstract A frequent assumption in Social Media is that its open nature leads to a representative view of the world. In this talk we want to consider bias occurring in the Social Web. We will consider a case study of liquid feedback, a direct democracy platform of the German pirate party as well as models of (non-)discriminating systems. As a conclusion of this talk we stipulate the need of Social Media systems to bias their working according to social norms and to publish the bias they introduce. Speaker Biography: Prof Steffen Staab Steffen studied in Erlangen (Germany), Philadelphia (USA) and Freiburg (Germany) computer science and computational linguistics. Afterwards he worked as researcher at Uni. Stuttgart/Fraunhofer and Univ. Karlsruhe, before he became professor in Koblenz (Germany). Since March 2015 he also holds a chair for Web and Computer Science at Univ. of Southampton sharing his time between here and Koblenz. In his research career he has managed to avoid almost all good advice that he now gives to his team members. Such advise includes focusing on research (vs. company) or concentrating on only one or two research areas (vs. considering ontologies, semantic web, social web, data engineering, text mining, peer-to-peer, multimedia, HCI, services, software modelling and programming and some more). Though, actually, improving how we understand and use text and data is a good common denominator for a lot of Steffen's professional activities.
Resumo:
Resumen tomado parcialmente de la propia publicación
Resumo:
La implementació de la Directiva Europea 91/271/CEE referent a tractament d'aigües residuals urbanes va promoure la construcció de noves instal·lacions al mateix temps que la introducció de noves tecnologies per tractar nutrients en àrees designades com a sensibles. Tant el disseny d'aquestes noves infraestructures com el redisseny de les ja existents es va portar a terme a partir d'aproximacions basades fonamentalment en objectius econòmics degut a la necessitat d'acabar les obres en un període de temps relativament curt. Aquests estudis estaven basats en coneixement heurístic o correlacions numèriques provinents de models determinístics simplificats. Així doncs, moltes de les estacions depuradores d'aigües residuals (EDARs) resultants van estar caracteritzades per una manca de robustesa i flexibilitat, poca controlabilitat, amb freqüents problemes microbiològics de separació de sòlids en el decantador secundari, elevats costos d'operació i eliminació parcial de nutrients allunyant-les de l'òptim de funcionament. Molts d'aquestes problemes van sorgir degut a un disseny inadequat, de manera que la comunitat científica es va adonar de la importància de les etapes inicials de disseny conceptual. Precisament per aquesta raó, els mètodes tradicionals de disseny han d'evolucionar cap a sistemes d'avaluació mes complexos, que tinguin en compte múltiples objectius, assegurant així un millor funcionament de la planta. Tot i la importància del disseny conceptual tenint en compte múltiples objectius, encara hi ha un buit important en la literatura científica tractant aquest camp d'investigació. L'objectiu que persegueix aquesta tesi és el de desenvolupar un mètode de disseny conceptual d'EDARs considerant múltiples objectius, de manera que serveixi d'eina de suport a la presa de decisions al seleccionar la millor alternativa entre diferents opcions de disseny. Aquest treball de recerca contribueix amb un mètode de disseny modular i evolutiu que combina diferent tècniques com: el procés de decisió jeràrquic, anàlisi multicriteri, optimació preliminar multiobjectiu basada en anàlisi de sensibilitat, tècniques d'extracció de coneixement i mineria de dades, anàlisi multivariant i anàlisi d'incertesa a partir de simulacions de Monte Carlo. Això s'ha aconseguit subdividint el mètode de disseny desenvolupat en aquesta tesis en quatre blocs principals: (1) generació jeràrquica i anàlisi multicriteri d'alternatives, (2) anàlisi de decisions crítiques, (3) anàlisi multivariant i (4) anàlisi d'incertesa. El primer dels blocs combina un procés de decisió jeràrquic amb anàlisi multicriteri. El procés de decisió jeràrquic subdivideix el disseny conceptual en una sèrie de qüestions mes fàcilment analitzables i avaluables mentre que l'anàlisi multicriteri permet la consideració de diferent objectius al mateix temps. D'aquesta manera es redueix el nombre d'alternatives a avaluar i fa que el futur disseny i operació de la planta estigui influenciat per aspectes ambientals, econòmics, tècnics i legals. Finalment aquest bloc inclou una anàlisi de sensibilitat dels pesos que proporciona informació de com varien les diferents alternatives al mateix temps que canvia la importància relativa del objectius de disseny. El segon bloc engloba tècniques d'anàlisi de sensibilitat, optimització preliminar multiobjectiu i extracció de coneixement per donar suport al disseny conceptual d'EDAR, seleccionant la millor alternativa un cop s'han identificat decisions crítiques. Les decisions crítiques són aquelles en les que s'ha de seleccionar entre alternatives que compleixen de forma similar els objectius de disseny però amb diferents implicacions pel que respecte a la futura estructura i operació de la planta. Aquest tipus d'anàlisi proporciona una visió més àmplia de l'espai de disseny i permet identificar direccions desitjables (o indesitjables) cap on el procés de disseny pot derivar. El tercer bloc de la tesi proporciona l'anàlisi multivariant de les matrius multicriteri obtingudes durant l'avaluació de les alternatives de disseny. Específicament, les tècniques utilitzades en aquest treball de recerca engloben: 1) anàlisi de conglomerats, 2) anàlisi de components principals/anàlisi factorial i 3) anàlisi discriminant. Com a resultat és possible un millor accés a les dades per realitzar la selecció de les alternatives, proporcionant més informació per a una avaluació mes efectiva, i finalment incrementant el coneixement del procés d'avaluació de les alternatives de disseny generades. En el quart i últim bloc desenvolupat en aquesta tesi, les diferents alternatives de disseny són avaluades amb incertesa. L'objectiu d'aquest bloc és el d'estudiar el canvi en la presa de decisions quan una alternativa és avaluada incloent o no incertesa en els paràmetres dels models que descriuen el seu comportament. La incertesa en el paràmetres del model s'introdueix a partir de funcions de probabilitat. Desprès es porten a terme simulacions Monte Carlo, on d'aquestes distribucions se n'extrauen números aleatoris que es subsisteixen pels paràmetres del model i permeten estudiar com la incertesa es propaga a través del model. Així és possible analitzar la variació en l'acompliment global dels objectius de disseny per a cada una de les alternatives, quines són les contribucions en aquesta variació que hi tenen els aspectes ambientals, legals, econòmics i tècnics, i finalment el canvi en la selecció d'alternatives quan hi ha una variació de la importància relativa dels objectius de disseny. En comparació amb les aproximacions tradicionals de disseny, el mètode desenvolupat en aquesta tesi adreça problemes de disseny/redisseny tenint en compte múltiples objectius i múltiples criteris. Al mateix temps, el procés de presa de decisions mostra de forma objectiva, transparent i sistemàtica el perquè una alternativa és seleccionada en front de les altres, proporcionant l'opció que més bé acompleix els objectius marcats, mostrant els punts forts i febles, les principals correlacions entre objectius i alternatives, i finalment tenint en compte la possible incertesa inherent en els paràmetres del model que es fan servir durant les anàlisis. Les possibilitats del mètode desenvolupat es demostren en aquesta tesi a partir de diferents casos d'estudi: selecció del tipus d'eliminació biològica de nitrogen (cas d'estudi # 1), optimització d'una estratègia de control (cas d'estudi # 2), redisseny d'una planta per aconseguir eliminació simultània de carboni, nitrogen i fòsfor (cas d'estudi # 3) i finalment anàlisi d'estratègies control a nivell de planta (casos d'estudi # 4 i # 5).