955 resultados para Large datasets
Resumo:
An important aspect of decision support systems involves applying sophisticated and flexible statistical models to real datasets and communicating these results to decision makers in interpretable ways. An important class of problem is the modelling of incidence such as fire, disease etc. Models of incidence known as point processes or Cox processes are particularly challenging as they are ‘doubly stochastic’ i.e. obtaining the probability mass function of incidents requires two integrals to be evaluated. Existing approaches to the problem either use simple models that obtain predictions using plug-in point estimates and do not distinguish between Cox processes and density estimation but do use sophisticated 3D visualization for interpretation. Alternatively other work employs sophisticated non-parametric Bayesian Cox process models, but do not use visualization to render interpretable complex spatial temporal forecasts. The contribution here is to fill this gap by inferring predictive distributions of Gaussian-log Cox processes and rendering them using state of the art 3D visualization techniques. This requires performing inference on an approximation of the model on a discretized grid of large scale and adapting an existing spatial-diurnal kernel to the log Gaussian Cox process context.
Resumo:
The terrorist attacks in the United States on September 11, 2001 appeared to be a harbinger of increased terrorism and violence in the 21st century, bringing terrorism and political violence to the forefront of public discussion. Questions about these events abound, and “Estimating the Historical and Future Probabilities of Large Scale Terrorist Event” [Clauset and Woodard (2013)] asks specifically, “how rare are large scale terrorist events?” and, in general, encourages discussion on the role of quantitative methods in terrorism research and policy and decision-making. Answering the primary question raises two challenges. The first is identify- ing terrorist events. The second is finding a simple yet robust model for rare events that has good explanatory and predictive capabilities. The challenges of identifying terrorist events is acknowledged and addressed by reviewing and using data from two well-known and reputable sources: the Memorial Institute for the Prevention of Terrorism-RAND database (MIPT-RAND) [Memorial Institute for the Prevention of Terrorism] and the Global Terror- ism Database (GTD) [National Consortium for the Study of Terrorism and Responses to Terrorism (START) (2012), LaFree and Dugan (2007)]. Clauset and Woodard (2013) provide a detailed discussion of the limitations of the data and the models used, in the context of the larger issues surrounding terrorism and policy.
Resumo:
Attempts by universities to provide an improved learning environment to students have led to an increase in team-teaching approaches in higher education. While the definitions of team-teaching differ slightly, the benefits of team-teaching have been cited widely in the higher education literature. By tapping the specialist knowledge of a variety of staff members, students are exposed to current and emerging knowledge in different fields and topic areas; students are also able to understand concepts from a variety of viewpoints. However, while there is some evidence of the usefulness of team-teaching, there is patchy empirical support to underpin how well students appreciate and adapt to team-teaching approaches. This paper reports on the team-teaching approaches adopted in the delivery of an introductory journalism and communication course at the University of Queensland. The success of the approaches is examined against the background of quantitative and qualitative data. The study found that team-teaching is generally very well received by undergraduate students because they value the diverse expertise and teaching styles they are exposed to. Despite the positive feedback, students also complained about problems of continuity and cohesiveness.
Resumo:
To characterize aphid mitochondrial genome (mitogenome) features, we sequenced the complete mitogenome of the Russian wheat aphid, Diuraphis noxia. The 15,784-bp mitogenome with a high A + T content (84.76%) and strong C skew (− 0.26) was arranged in the same gene order as that of the ancestral insect. Unlike typical insect mitogenomes, D. noxia possessed a large tandem repeat region (644 bp) located between trnE and trnF. Sequencing partial mitogenome of the cotton aphid (Aphis gossypii) further confirmed the presence of the large repeat region in aphids, but with different repeat length and copy number. Another motif (58 bp) tandemly repeated 2.3 times in the control region of D. noxia. All repeat units in D. noxia could be folded into stem-loop secondary structures, which could further promote an increase in copy numbers. Characterization of the D. noxia mitogenome revealed distinct mitogenome architectures, thus advancing our understanding of insect mitogenomic diversities and evolution.
Resumo:
The growing public concern about the complexity, cost and uncertain efficacy of the statuary environmental impact assessment process applying to large-scale projects in Queensland is reviewed. This is based on field data gathered over the past six years sat large-scale marina developments that access major environmental reserves along the coast. An ecological design proposal to broaden the process consisted with both government aspirations and regional ecological parameters - termed Regional Landscape Strategies - would allow the existing Environmental Impact Asessment to be modified alone potentially more practicable and effective lines.
Resumo:
Ecological principles have been employed to assist in the sustainability of a suite of 'gateway' marinas currently being developed in Queensland. Tasks included (a) location and fostering of core remnant native vegetation areas, (b) understanding the dynamic patterns of region behaviour using the ecological strategies employed by key flora and fauna species, (c) promoting those native wildlife species best characterising the region, and (d) allocating management actions along elongated buffer zones to the catchment headwaters (rather than only peripheral to the property). The design of infrastructure and its relationship to sustainable landscape development is lacking such a response int eh planning and detailing of new marinas. This paper distinguishes between the practice of landscape ecology and the design of ecological landscapes, offering examples of the principles of the latter in support of the concept of ecological landscape practice.
Resumo:
We present two unconditional secure protocols for private set disjointness tests. In order to provide intuition of our protocols, we give a naive example that applies Sylvester matrices. Unfortunately, this simple construction is insecure as it reveals information about the intersection cardinality. More specifically, it discloses its lower bound. By using the Lagrange interpolation, we provide a protocol for the honest-but-curious case without revealing any additional information. Finally, we describe a protocol that is secure against malicious adversaries. In this protocol, a verification test is applied to detect misbehaving participants. Both protocols require O(1) rounds of communication. Our protocols are more efficient than the previous protocols in terms of communication and computation overhead. Unlike previous protocols whose security relies on computational assumptions, our protocols provide information theoretic security. To our knowledge, our protocols are the first ones that have been designed without a generic secure function evaluation. More important, they are the most efficient protocols for private disjointness tests in the malicious adversary case.
Resumo:
Increased levels of polybrominated diphenyl ethers (PBDEs) can occur particularly in dust and soil surrounding facilities that recycle products containing PBDEs. This may be the source of increased exposure for nearby workers and residents. To investigate, we measured PBDE levels in soil, office dust and blood of workers at the closest workplace (i.e. within 100m) to a large automotive shredding and metal recycling facility in Brisbane, Australia. The workplace investigated in this study was independent of the automotive shredding facility and was one of approximately 50 businesses of varying types within a relatively large commercial/industrial area surrounding the recycling facility. Concentrations of PBDEs in soils were at least an order of magnitude greater than background levels in the area. Congener profiles were dominated by larger molecular weight congeners; in particular BDE-209. This reflected the profile in outdoor air samples previously collected at this site. Biomonitoring data from blood serum indicated no differential exposure for workers near the recycling facility compared to a reference group of office workers, also in Brisbane. Unlike air, indoor dust and soil sample profiles, serum samples from both worker groups were dominated by congeners BDE-47, BDE-153, BDE-99, BDE-100 and BDE-183 and was similar to the profile previously reported in the general Australian population. Estimated exposures for workers near the industrial point source suggested indoor workers had significantly higher exposure than outdoor workers due to their exposure to indoor dust rather than soil. However, no relationship was observed between blood PBDE levels and different roles and activity patterns of workers on-site. These comparisons of PBDE levels in serum provide additional insight into the inter-individual variability within Australia. Results also indicate congener patterns in the workplace environment did not match blood profiles of workers. This was attributed to the relatively high background exposures for the general Australian population via dietary intake and the home environment.
Resumo:
At Eurocrypt’04, Freedman, Nissim and Pinkas introduced a fuzzy private matching problem. The problem is defined as follows. Given two parties, each of them having a set of vectors where each vector has T integer components, the fuzzy private matching is to securely test if each vector of one set matches any vector of another set for at least t components where t < T. In the conclusion of their paper, they asked whether it was possible to design a fuzzy private matching protocol without incurring a communication complexity with the factor (T t ) . We answer their question in the affirmative by presenting a protocol based on homomorphic encryption, combined with the novel notion of a share-hiding error-correcting secret sharing scheme, which we show how to implement with efficient decoding using interleaved Reed-Solomon codes. This scheme may be of independent interest. Our protocol is provably secure against passive adversaries, and has better efficiency than previous protocols for certain parameter values.
Resumo:
Next Generation Sequencing (NGS) has revolutionised molecular biology, resulting in an explosion of data sets and an increasing role in clinical practice. Such applications necessarily require rapid identification of the organism as a prelude to annotation and further analysis. NGS data consist of a substantial number of short sequence reads, given context through downstream assembly and annotation, a process requiring reads consistent with the assumed species or species group. Highly accurate results have been obtained for restricted sets using SVM classifiers, but such methods are difficult to parallelise and success depends on careful attention to feature selection. This work examines the problem at very large scale, using a mix of synthetic and real data with a view to determining the overall structure of the problem and the effectiveness of parallel ensembles of simpler classifiers (principally random forests) in addressing the challenges of large scale genomics.
Resumo:
Enterprises, both public and private, have rapidly commenced using the benefits of enterprise resource planning (ERP) combined with business analytics and “open data sets” which are often outside the control of the enterprise to gain further efficiencies, build new service operations and increase business activity. In many cases, these business activities are based around relevant software systems hosted in a “cloud computing” environment. “Garbage in, garbage out”, or “GIGO”, is a term long used to describe problems in unqualified dependency on information systems, dating from the 1960s. However, a more pertinent variation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems, such as ERP and usage of open datasets in a cloud environment, the ability to verify the authenticity of those data sets used may be almost impossible, resulting in dependence upon questionable results. Illicit data set “impersonation” becomes a reality. At the same time the ability to audit such results may be an important requirement, particularly in the public sector. This paper discusses the need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment and analyses some current technologies that are offered and which may be appropriate. However, severe limitations to addressing these requirements have been identified and the paper proposes further research work in the area.
Resumo:
Enterprise resource planning (ERP) systems are rapidly being combined with “big data” analytics processes and publicly available “open data sets”, which are usually outside the arena of the enterprise, to expand activity through better service to current clients as well as identifying new opportunities. Moreover, these activities are now largely based around relevant software systems hosted in a “cloud computing” environment. However, the over 50- year old phrase related to mistrust in computer systems, namely “garbage in, garbage out” or “GIGO”, is used to describe problems of unqualified and unquestioning dependency on information systems. However, a more relevant GIGO interpretation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems based around ERP and open datasets as well as “big data” analytics, particularly in a cloud environment, the ability to verify the authenticity and integrity of the data sets used may be almost impossible. In turn, this may easily result in decision making based upon questionable results which are unverifiable. Illicit “impersonation” of and modifications to legitimate data sets may become a reality while at the same time the ability to audit any derived results of analysis may be an important requirement, particularly in the public sector. The pressing need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment is discussed in this paper. Some current and appropriate technologies currently being offered are also examined. However, severe limitations in addressing the problems identified are found and the paper proposes further necessary research work for the area. (Note: This paper is based on an earlier unpublished paper/presentation “Identity, Addressing, Authenticity and Audit Requirements for Trust in ERP, Analytics and Big/Open Data in a ‘Cloud’ Computing Environment: A Review and Proposal” presented to the Department of Accounting and IT, College of Management, National Chung Chen University, 20 November 2013.)
Resumo:
Objective Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. Methods and Materials The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors. Results Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training. Conclusion Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data.
Resumo:
We present efficient protocols for private set disjointness tests. We start from an intuition of our protocols that applies Sylvester matrices. Unfortunately, this simple construction is insecure as it reveals information about the cardinality of the intersection. More specifically, it discloses its lower bound. By using the Lagrange interpolation we provide a protocol for the honest-but-curious case without revealing any additional information. Finally, we describe a protocol that is secure against malicious adversaries. The protocol applies a verification test to detect misbehaving participants. Both protocols require O(1) rounds of communication. Our protocols are more efficient than the previous protocols in terms of communication and computation overhead. Unlike previous protocols whose security relies on computational assumptions, our protocols provide information theoretic security. To our knowledge, our protocols are first ones that have been designed without a generic secure function evaluation. More importantly, they are the most efficient protocols for private disjointness tests for the malicious adversary case.
Resumo:
There are limited studies that describe patient meal preferences in hospital; however this data is critical to develop menus that address satisfaction and nutrition whilst balancing resources. This quality study aimed to determine preferences for meals and snacks to inform a comprehensive menu revision in a large (929 bed) tertiary public hospital. The method was based on Vivanti et al. (2008) with data collected by two final year dietetic students. The first survey comprised 72 questions, achieved a response rate of 68% (n = 192), with the second more focused at 47 questions achieving a higher response rate of 93% (n = 212). Findings showed over half the patients reporting poor or less than normal appetite, 20% describing taste issues, over a third with a LOS >7 days, a third with a MST _ 2 and less than half eating only from the general menu. Soup then toast was most frequently reported as eaten at home when unwell, and whilst most reported not missing any foods when in hospital (25%), steak was most commonly missed. Hot breakfasts were desired by the majority (63%), with over half preferring toast (even if cold). In relation to snacks, nearly half (48%) wanted something more substantial than tea/coffee/biscuits, with sandwiches (54%) and soup (33%) being suggested. Sandwiches at the evening meal were not popular (6%). Difficulties with using cutlery and meal size selection were identified as issues. Findings from this study had high utility and supported a collaborative and evidenced based approach to a successful major menu change for the hospital.