996 resultados para Data cleaning


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Supernovae are among the most energetic events occurring in the universe and are so far the only verified extrasolar source of neutrinos. As the explosion mechanism is still not well understood, recording a burst of neutrinos from such a stellar explosion would be an important benchmark for particle physics as well as for the core collapse models. The neutrino telescope IceCube is located at the Geographic South Pole and monitors the antarctic glacier for Cherenkov photons. Even though it was conceived for the detection of high energy neutrinos, it is capable of identifying a burst of low energy neutrinos ejected from a supernova in the Milky Way by exploiting the low photomultiplier noise in the antarctic ice and extracting a collective rate increase. A signal Monte Carlo specifically developed for water Cherenkov telescopes is presented. With its help, we will investigate how well IceCube can distinguish between core collapse models and oscillation scenarios. In the second part, nine years of data taken with the IceCube precursor AMANDA will be analyzed. Intensive data cleaning methods will be presented along with a background simulation. From the result, an upper limit on the expected occurrence of supernovae within the Milky Way will be determined.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Sviluppo e analisi di un dataset campione, composto da circa 3 mln di entry ed estratto da un data warehouse di informazioni riguardanti il consumo energetico di diverse smart home.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

Increasing amounts of clinical research data are collected by manual data entry into electronic source systems and directly from research subjects. For this manual entered source data, common methods of data cleaning such as post-entry identification and resolution of discrepancies and double data entry are not feasible. However data accuracy rates achieved without these mechanisms may be higher than desired for a particular research use. We evaluated a heuristic usability method for utility as a tool to independently and prospectively identify data collection form questions associated with data errors. The method evaluated had a promising sensitivity of 64% and a specificity of 67%. The method was used as described in the literature for usability with no further adaptations or specialization for predicting data errors. We conclude that usability evaluation methodology should be further investigated for use in data quality assurance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The paper investigates the occurrence of non-injury incidents among cyclists in the UK, seeking to (i) generate a rate that can be compared with injury rates, (ii) analyse factors affecting incident rates, and (iii) analyse factors affecting the impact of incidents on cyclists. We collected data on non-injury cycling ‘incidents’ (near misses and other frightening and/or annoying incidents) from 1692 online diaries of cycle trip stages1 and incidents, participants having signed up in advance for a specific day. Following data cleaning and coding, a dataset was created covering 1532 diary days and 3994 records of incidents occurring within the UK. Incident rates were calculated and compared to injury risks for cyclists. Cross-tabulation and regression were used to identify factors affecting incident rates and the effect an incident has on the cyclist. Frightening or annoying non-injury incidents, unlike slight injuries, are an everyday experience for most people cycling in the UK. For regular cyclists ‘very scary’ incidents (rated as 3 on a 0–3 scale) are on average a weekly experience, with deliberate aggression experienced monthly. Per mile, non-injury incidents were more frequent for people making shorter and slower trips. People aged over 55 were at lower risk, as were those cycling at the weekend and outside the morning peak. Incidents that involved motor vehicles, especially those involving larger vehicles, were more frightening than those that did not. Near miss and other non-injury incidents are widespread in the UK and may have a substantial impact on cycling experience and uptake. Policy and research should initially target the most frightening types of incident, such as very close passes and incidents involving large vehicles. Further attention needs to be paid to the experiences of groups under-represented among cyclists, such as women making shorter trips.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The effect of a prolonged period of strongly northward Interplanetary Magnetic Field (IMF) on the high-latitude F-region is studied using data from the EISCAT Common Programme Zero mode of operation on 11–12 August 1982. The analysis of the raw autocorrelation functions is kept to the directly derived parameters Ne, Te, Ti and velocity, and limits are defined for the errors introduced by assumptions about ion composition and by changes in the transmitted power and system constant. Simple data-cleaning criteria are employed to eliminate problems due to coherent signals and large background noise levels. The observed variations in plasma densities, temperatures and velocities are interpreted in terms of supporting data from ISEE-3 and local riometers and magnetometers. Both field-aligned and field-perpendicular plasma flows at Tromsø showed effects of the northward IMF: convection was slow and irregular and field-aligned flow profiles were characteristic of steady-state polar wind outflow with flux of order 1012 m−2 s−1. This period followed a strongly southward IMF which had triggered a substorm. The substorm gave enhanced convection, with a swing to equatorward flow and large (5 × 1012 m−2 s−1), steady-state field-aligned fluxes, leading to the possibility of O+ escape into the magnetosphere. The apparent influence of the IMF over both field-perpendicular and field-aligned flows is explained in terms of the cross-cap potential difference and the location of the auroral oval.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The thesis aims to elaborate on the optimum trigger speed for Vehicle Activated Signs (VAS) and to study the effectiveness of VAS trigger speed on drivers’ behaviour. Vehicle activated signs (VAS) are speed warning signs that are activated by individual vehicle when the driver exceeds a speed threshold. The threshold, which triggers the VAS, is commonly based on a driver speed, and accordingly, is called a trigger speed. At present, the trigger speed activating the VAS is usually set to a constant value and does not consider the fact that an optimal trigger speed might exist. The optimal trigger speed significantly impacts driver behaviour. In order to be able to fulfil the aims of this thesis, systematic vehicle speed data were collected from field experiments that utilized Doppler radar. Further calibration methods for the radar used in the experiment have been developed and evaluated to provide accurate data for the experiment. The calibration method was bidirectional; consisting of data cleaning and data reconstruction. The data cleaning calibration had a superior performance than the calibration based on the reconstructed data. To study the effectiveness of trigger speed on driver behaviour, the collected data were analysed by both descriptive and inferential statistics. Both descriptive and inferential statistics showed that the change in trigger speed had an effect on vehicle mean speed and on vehicle standard deviation of the mean speed. When the trigger speed was set near the speed limit, the standard deviation was high. Therefore, the choice of trigger speed cannot be based solely on the speed limit at the proposed VAS location. The optimal trigger speeds for VAS were not considered in previous studies. As well, the relationship between the trigger value and its consequences under different conditions were not clearly stated. The finding from this thesis is that the optimal trigger speed should be primarily based on lowering the standard deviation rather than lowering the mean speed of vehicles. Furthermore, the optimal trigger speed should be set near the 85th percentile speed, with the goal of lowering the standard deviation.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In conventional content based image retrieval (CBIR) employing relevance feedback, one implicit assumption is that both pure positive and negative examples are available. However it is not always true in the practical applications of CBIR. In this paper, we address a new problem of image retrieval using several unclean positive examples, named noisy query, in which some mislabeled images or weak relevant images present. The proposed image retrieval scheme measures the image similarity by combining multiple feature distances. Incorporating data cleaning and noise tolerant classifier, a twostep strategy is proposed to handle noisy positive examples. Experiments carried out on a subset of Corel image collection show that the proposed scheme outperforms the competing image retrieval schemes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Conventional content-based image retrieval (CBIR) schemes employing relevance feedback may suffer from some problems in the practical applications. First, most ordinary users would like to complete their search in a single interaction especially on the web. Second, it is time consuming and difficult to label a lot of negative examples with sufficient variety. Third, ordinary users may introduce some noisy examples into the query. This correspondence explores solutions to a new issue that image retrieval using unclean positive examples. In the proposed scheme, multiple feature distances are combined to obtain image similarity using classification technology. To handle the noisy positive examples, a new two-step strategy is proposed by incorporating the methods of data cleaning and noise tolerant classifier. The extensive experiments carried out on two different real image collections validate the effectiveness of the proposed scheme.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

While SQL injection attacks have been plaguing web applications for years the threat they pose to RFID systems have only identified recently. Because the architecture of web systems and RFID systems differ considerably the prevention and detection techniques proposed for web applications are not suitable for RFID systems. In this paper we propose a system to secure RFID systems against tag based SQLIA. Our system is optimized for the architecture of RFID systems and consists of a query structure matching technique and tag data cleaning technique. The novelty of the proposed system is that it's specifically aimed at RFID systems and has the ability to detect and prevent second order injections which is a problem most current solutions haven't addressed. The preliminary evaluation of our query matching technique is very promising showing very high detection rate with minimal false positives.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Casamento aproximado de instâncias é um problema central em muitos processos de gerenciamento de dados, tais como integração de dados, data cleaning e consulta aproximada. O principal objetivo de casamento aproximado é determinar se duas instâncias representam o mesmo objeto do mundo real. Para valores atômicos, diversas funções de similaridade têm sido definidas, que geralmente são dependentes do domínio de valores. Por outro lado, casamento de valores agregados, como tuplas ou árvores XML, ainda é um problema importante. Neste cenário, dois problemas podem ser identificados. O primeiro diz respeito a como os resultados gerados por diferentes funções de similaridade devem ser combinados em um escore único, ou para um escore normalizado. Funções individuais geralmente geram escores que não são comparáveis, pode-se obter diferentes distribuições a partir de cada função. Isto significa que não existe uma forma simples de combinar escores gerados por funções de similaridade distintas usando uma medida simples, em casamento de agregados. Nesta tese, a proposta é, ao invés de utilizar os escores originalmente gerados pelas funções de similaridade, aplicar um método para estimar a precisão dos resultados de cada função, e usar esta precisão estimada como um escore ajustado. Através deste método, a proposta apresentada nesta tese envolve duas contribuições a este problema. Primeiro, é possível permitir que o usuário especifique valores de ponto de corte (thresholds) que sejam significativos, usando para isso um valor de precisão ajustada como um escore de similaridade Além disso, usando o escore ajustado, são obtidos resultados mais precisos em um processo de casamento aproximado de agregados. O segundo problema, surge quando os escores são combinados em casamento de agregados, e diz respeito à função de similaridade utilizada para combinar os valores. Particularmente, um agregado pode ser estruturado de diferentes maneiras, tais como tupla, conjunto e lista. O processo de combinação usado em cada caso deve ser distinto, a fim de se alcançar resultados mais exatos. Entretanto, não é claro como escores de similaridade individuais podem ser combinados para calcular, apropriadamente, escores para um agregado. O processo de combinação deveria ser distinto em cada caso. A contribuição apresentada para este problema é a definição de funções de similaridade específicas para cada tipo de agregado, dependendo da estruturação. Palavras-chave: Similaridade, funções de similaridade, casamento de instâncias, revocação e precisão.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Pós-graduação em Engenharia Mecânica - FEG

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aim of the study Due to the valuable contribution made by volunteers to sporting events, a better understanding of volunteers’ motivation is imperative for event managers in order to develop effective volunteer re-cruitment and retention strategies. The adoption of working conditions and task domains to the mo-tives and needs of volunteers is one of the key challenges in volunteer management. Conversely, an ignorance of the motives and needs of volunteers could negatively affect their performance and attitude, which will have negative consequences for the execution of events (Strigas & Jackson, 2003). In general, the motives of volunteers are located on a continuum between selflessness (e.g. helping others), and self-interest (e.g. pursuing one’s own interests). Furthermore, it should take into account that volunteers may be motivated by more than one need or goal, and therefore, configure different bundles of motives, resulting in heterogeneous types of motives for voluntary engagement (Dolnicar & Randle, 2007). Despite the extensive number of studies on the motives of sport event volunteers, only few studies focus on the analysis of individual motive profiles concerning volun-teering. Accordingly, we will take a closer look at the following questions: To what extent do volun-teers at sporting events differ in the motives of their engagement, and how can the volunteers be ade-quately classified? Theoretical Background According to the functional approach, relevant subjective motives are related to the outcomes and consequences that volunteering is supposed to lead to and to produce. This means, individuals’ mo-tives determine which incentives are anticipated in return for volunteering (e.g. increase in social contacts), and are important for engaging in volunteering, e.g. the choice between different oppor-tunities for voluntary activity, or different tasks (Stukas et al., 2009). Additionally, inter-individual differences of motive structures as well as matching motives in the reflections of voluntary activities will be considered by using a person-oriented approach. In the person-oriented approach, it is not the specific variables that are made the entities of investigation, but rather persons with a certain combination of characteristic features (Bergmann et al., 2003). Person-orientation in the field of sports event volunteers, it is therefore essential to implement an orientation towards people as a unit of analysis. Accordingly, individual motive profiles become the object of investigation. The individ-ual motive profiles permit a glimpse of intra-individual differences in the evaluation of different motive areas, and thus represent the real subjective perspective. Hence, a person will compare the importance of individual motives for his behaviour primarily in relation to other motives (e.g. social contacts are more important to me than material incentives), and make fewer comparisons with the assessments of other people. Methodology, research design and data analysis The motives of sports event volunteers were analysed in the context of the European Athletics Championships 2014 in Zürich. After data cleaning, the study sample contained a total of 1,169 volunteers, surveyed by an online questionnaire. The VMS-ISA scale developed by Bang and Chel-ladurai (2009) was used and replicated successfully by a confirmatory factor analysis. Accordingly, all seven factors of the scale were included in the subsequent cluster analysis to determine typical motive profiles of volunteers. Before proceeding with the cluster analysis, an intra-individual stand-ardization procedure (according to Spiel, 1998) was applied to take advantage of the intra-individual relationships between the motives of the volunteers. Intra-individual standardization means that every value of each motive dimension was related to the average individual level of ex-pectations. In the final step, motive profiles were determined using a hierarchic cluster analysis based on Ward’s method with squared Euclidean distances. Results, discussion and implications The results reveal that motivational processes differ among sports event volunteers, and that volunteers sometimes combine contradictory bundles of motives. In our study, four different volunteer motive profiles were identified and described by their positive levels on the individual motive dimension: the community supporters, the material incentive seekers, the social networkers, and the career and personal growth pursuers. To describe the four identified motive profiles in more detail and to externally validate them, the clusters were analysed in relation to socio-economic, sport-related, and voluntary work characteristics. This motive-based typology of sports event volunteers can provide valuable guidance for event managers in order to create distinctive and designable working conditions and tasks at sporting events that should, in relation to a person-oriented approach, be tailored to a wide range of individ-ual prerequisites. Furthermore, specific recruitment procedures and appropriate communication measures can be defined in order to approach certain groups of potential volunteers more effectively. References Bang, H., & Chelladurai, P. (2009). Development and validation of the volunteer motivations scale for international sporting events (VMS-ISE). International Journal Sport Management and Market-ing, 6, 332-350. Bergmann, L. R., Magnusson, D., & El-Khouri, B. M. (2003). Studying individual development in an interindividual context. Mahwah, NJ: Erlbaum. Dolnicar, S., & Randle, M. (2007). What motivates which volunteers? Psychographic heterogeneity among volunteers in Australia. Voluntas, 18, 135-155. Spiel, C. (1998). Four methodological approaches to the study of stability and change in develop-ment. Methods of Psychological Research Online, 3, 8-22. Stukas, A. A., Worth, K. A., Clary, E. G., & Snyder, M. (2009). The matching of motivations to affordances in the volunteer environment: an index for assessing the impact of multiple matches on volunteer outcomes. Nonprofit and Voluntary Sector Quarterly, 38, 5-28.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Online learning systems (OLS) have become center stage for corporations and educational institutions as a competitive tool in the knowledge economy. The satisfaction construct has received extensive coverage in information systems literature as an indicator of effectiveness but has been criticized for lack of validity; yet, the value construct has been largely ignored, although it has a long history in psychology, sociology, and behavioral science. The purpose of this dissertation is to investigate the value and satisfaction constructs in the context of OLS, and their perceived by learners relationship for implied effectiveness of OLS. ^ First, a qualitative phase is employed to gather OLS values from learners' focus groups, followed by a pilot phase to refine a proposed instrument, and a main phase to validate the survey. Responses were received from 75 students in four focus groups, 141 in the pilot, and 207 the main survey. Extensive data cleaning and exploratory factor analysis were done to identify factors of learners' perceived value and satisfaction of OLS. Then, Value-Satisfaction grids and the Learners' Value Index of Satisfaction (LeVIS) were developed as benchmarking tools of OLS. Moreover, Multicriteria Decision Analysis (MCDA) techniques were employed to impute value from satisfaction scores in order to reduce survey response time. ^ The results provided four satisfaction and four value factors with high reliability (Cronbach's α). Moreover, value and satisfaction were found to have low linear and nonlinear correlations, indicating that they are two distinct uncorrelated constructs. This is consistent with the literature. Value-Satisfaction grids and the LeVIS index indicated relatively high effectiveness for technology and support characteristics, relatively low effectiveness for professor's characteristics, while course and learner characteristics indicated average effectiveness. ^ The main contributions of this study include identifying, defining, and articulating the relationship between value and satisfaction constructs as assessment of users' implied IS effectiveness, as well as assessing the accuracy of MCDA procedures to predict value scores, thus reducing by half the survey questionnaire size. ^