999 resultados para Data cleansing


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets. © (2012) by the AIS/ICIS Administrative Office All rights reserved.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The increased popularity of mopeds and motor scooters in Australia and elsewhere in the last decade has contributed substantially to the greater use of powered two-wheelers (PTWs) as a whole. As the exposure of mopeds and scooters has increased, so too has the number of reported crashes involving those PTW types, but there is currently little research comparing the safety of mopeds and, particularly, larger scooters with motorcycles. This study compared the crash risk and crash severity of motorcycles, mopeds and larger scooters in Queensland, Australia. Comprehensive data cleansing was undertaken to separate motorcycles, mopeds and larger scooters in police-reported crash data covering the five years to 30 June 2008. The crash rates of motorcycles (including larger scooters) and mopeds in terms of registered vehicles were similar over this period, although the moped crash rate showed a stronger downward trend. However, the crash rates in terms of distance travelled were nearly four times higher for mopeds than for motorcycles (including larger scooters). More comprehensive distance travelled data is needed to confirm these findings. The overall severity of moped and scooter crashes was significantly lower than motorcycle crashes but an ordered probit regression model showed that crash severity outcomes related to differences in crash characteristics and circumstances, rather than differences between PTW types per se. Greater motorcycle crash severity was associated with higher (>80 km/h) speed zones, horizontal curves, weekend, single vehicle and nighttime crashes. Moped crashes were more severe at night and in speed zones of 90 km/h or more. Larger scooter crashes were more severe in 70 km/h zones (than 60 km/h zones) but not in higher speed zones, and less severe on weekends than on weekdays. The findings can be used to inform potential crash and injury countermeasures tailored to users of different PTW types.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

La entrada en vigor del Tratado de Libre Comercio (TLC) con Estados Unidos representa para los empresarios colombianos la oportunidad de acceder al mercado más importante del mundo en una posición privilegiada, bajo la cual resulta más sencilla la colocación de productos en este pais para aquellas compañías con vocación exportadora. Sin embargo, la alta competencia y desarrollo de este mercado hace necesario que las empresas cuenten con información apropiada que les permita enfocar sus esfuerzos en productos, segmentos de mercado o Estados específicos donde puedan alcanzar la sostenibilidad y perdurabilidad en el tiempo, así como el desarrollo de nuevas posibilidades comerciales. Para tal fin, realizamos este trabajo de investigación con el cual se busca generar una herramienta informática que contenga información respecto al flujo comercial de Colombia hacia cada uno de los 50 estados de EE.UU., detallando en cada caso las oportunidades comerciales identificadas por partidas arancelarias; y que servirá de apoyo para aquellos empresarios colombianos que buscan beneficiarse de la nueva coyuntura comercial que ofrece el acuerdo bilateral.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Prior studies on museum visitors are extensively centred on national museums, the studies on regional museums are scarce. To fill in the academic gap, a research is proposed concerning the visitors of Dalarna Museum, a regional museum in Sweden. With an aim to profile visitors’ demographic characteristics and investigate the motivational factors that influence visitors’ frequency of visits, a face-to-face questionnaire survey was implemented at Dalarna Museum. To get visitors’ demographic characteristics, a few closed and open questions are devised to profile visitors’ gender, age, occupation, income, education, number of children and residence place. To investigate the motivational factors that influence visitors’ frequency of visits, a seven-point Likert questionnaire is employed with 17 motivational factors included. During a 12-day data collection, 372 visitors were invited to participate in the questionnaire survey, whereof 357 had filled in the questionnaire, generating a response rate that is as high as 96 percent. After data cleansing, there are 355 completed and valid responses in total. According to the results, some of visitors’ demographic characteristics are similar including gender, age, occupation, income, and number of children. However, the characteristics regarding visitors’ residence places and educational attainments are different comparing the frequent visitors to occasional visitors. Through running a multiple regression analysis, 13 out of the 17 motivational factors are detected having significant influences on visitors’ frequency of visits to Dalarna Museum, of which the most influential one is visitors’ day-outs with their friends and relatives.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

In group decision making (GDM) problems, ordinal data provide a convenient way of articulating preferences from decision makers (DMs). A number of GDM models have been proposed to aggregate such kind of preferences in the literature. However, most of the GDM models that handle ordinal preferences suffer from two drawbacks: (1) it is difficult for the GDM models to manage conflicting opinions, especially with a large number of DMs; and (2) the relationships between the preferences provided by the DMs are neglected, and all DMs are assumed to be of equal importance, therefore causing the aggregated collective preference not an ideal representative of the group's decision. In order to overcome these problems, a two-stage dynamic group decision making method for aggregating ordinal preferences is proposed in this paper. The method consists of two main processes: (i) a data cleansing process, which aims to reduce the influence of conflicting opinions pertaining to the collective decision prior to the aggregation process; as such an effective solution for undertaking large-scale GDM problems is formulated; and (ii) a support degree oriented consensus-reaching process, where the collective preference is aggregated by using the Power Average (PA) operator; as such, the relationships of the arguments being aggregated are taken into consideration (i.e., allowing the values being aggregated to support each other). A new support function for the PA operator to deal with ordinal information is defined based on the dominance-based rough set approach. The proposed GDM model is compared with the models presented by Herrera-Viedma et al. An application related to controlling the degradation of the hydrographic basin of a river in Brazil is evaluated. The results demonstrate the usefulness of the proposed method in handling GDM problems with ordinal information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

20.00% 20.00%

Publicador:

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Recent data indicate that levels of overweight and obesity are increasing at an alarming rate throughout the world. At a population level (and commonly to assess individual health risk), the prevalence of overweight and obesity is calculated using cut-offs of the Body Mass Index (BMI) derived from height and weight. Similarly, the BMI is also used to classify individuals and to provide a notional indication of potential health risk. It is likely that epidemiologic surveys that are reliant on BMI as a measure of adiposity will overestimate the number of individuals in the overweight (and slightly obese) categories. This tendency to misclassify individuals may be more pronounced in athletic populations or groups in which the proportion of more active individuals is higher. This differential is most pronounced in sports where it is advantageous to have a high BMI (but not necessarily high fatness). To illustrate this point we calculated the BMIs of international professional rugby players from the four teams involved in the semi-finals of the 2003 Rugby Union World Cup. According to the World Health Organisation (WHO) cut-offs for BMI, approximately 65% of the players were classified as overweight and approximately 25% as obese. These findings demonstrate that a high BMI is commonplace (and a potentially desirable attribute for sport performance) in professional rugby players. An unanswered question is what proportion of the wider population, classified as overweight (or obese) according to the BMI, is misclassified according to both fatness and health risk? It is evident that being overweight should not be an obstacle to a physically active lifestyle. Similarly, a reliance on BMI alone may misclassify a number of individuals who might otherwise have been automatically considered fat and/or unfit.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

In this paper, a singularly perturbed ordinary differential equation with non-smooth data is considered. The numerical method is generated by means of a Petrov-Galerkin finite element method with the piecewise-exponential test function and the piecewise-linear trial function. At the discontinuous point of the coefficient, a special technique is used. The method is shown to be first-order accurate and singular perturbation parameter uniform convergence. Finally, numerical results are presented, which are in agreement with theoretical results.