750 resultados para zero-inflated data
Resumo:
The discovery of protein variation is an important strategy in disease diagnosis within the biological sciences. The current benchmark for elucidating information from multiple biological variables is the so called “omics” disciplines of the biological sciences. Such variability is uncovered by implementation of multivariable data mining techniques which come under two primary categories, machine learning strategies and statistical based approaches. Typically proteomic studies can produce hundreds or thousands of variables, p, per observation, n, depending on the analytical platform or method employed to generate the data. Many classification methods are limited by an n≪p constraint, and as such, require pre-treatment to reduce the dimensionality prior to classification. Recently machine learning techniques have gained popularity in the field for their ability to successfully classify unknown samples. One limitation of such methods is the lack of a functional model allowing meaningful interpretation of results in terms of the features used for classification. This is a problem that might be solved using a statistical model-based approach where not only is the importance of the individual protein explicit, they are combined into a readily interpretable classification rule without relying on a black box approach. Here we incorporate statistical dimension reduction techniques Partial Least Squares (PLS) and Principal Components Analysis (PCA) followed by both statistical and machine learning classification methods, and compared them to a popular machine learning technique, Support Vector Machines (SVM). Both PLS and SVM demonstrate strong utility for proteomic classification problems.
Resumo:
The research team recognized the value of network-level Falling Weight Deflectometer (FWD) testing to evaluate the structural condition trends of flexible pavements. However, practical limitations due to the cost of testing, traffic control and safety concerns and the ability to test a large network may discourage some agencies from conducting the network-level FWD testing. For this reason, the surrogate measure of the Structural Condition Index (SCI) is suggested for use. The main purpose of the research presented in this paper is to investigate data mining strategies and to develop a prediction method of the structural condition trends for network-level applications which does not require FWD testing. The research team first evaluated the existing and historical pavement condition, distress, ride, traffic and other data attributes in the Texas Department of Transportation (TxDOT) Pavement Maintenance Information System (PMIS), applied data mining strategies to the data, discovered useful patterns and knowledge for SCI value prediction, and finally provided a reasonable measure of pavement structural condition which is correlated to the SCI. To evaluate the performance of the developed prediction approach, a case study was conducted using the SCI data calculated from the FWD data collected on flexible pavements over a 5-year period (2005 – 09) from 354 PMIS sections representing 37 pavement sections on the Texas highway system. The preliminary study results showed that the proposed approach can be used as a supportive pavement structural index in the event when FWD deflection data is not available and help pavement managers identify the timing and appropriate treatment level of preventive maintenance activities.
Resumo:
Although accountability in the form of high stakes testing is in favour in the contemporary Australian educational context, this practice remains a highly contested source of debate. Proponents for high stakes tests claim that higher standards in teaching and learning result from their implementation, whereas others believe that this type of testing regime is not required and may even in fact be counterproductive. Regardless of what side of the debate you sit on, the reality is that at present, high stakes testing appears to be here to stay. It could therefore be argued it is essential that teachers understand accountability and possess the specific skills to interpret and use test data beneficially.
Resumo:
This paper demonstrates the affordances of the work diary as a data collection tool for both pilot studies and qualitative research of social interactions. Observation is the cornerstone of many qualitative, ethnographic research projects (Creswell, 2008). However, determining through observation, the activities of busy school teams could be likened to joining dots of a child’s drawing activity to reveal a complex picture of interactions. Teachers, leaders and support personnel are in different locations within a school, performing diverse tasks for a variety of outcomes, which hopefully achieve a common goal. As a researcher, the quest to observe these busy teams and their interactions with each other was daunting and perhaps unrealistic. The decision to use a diary as part of a wider research project was to overcome the physical impossibility of simultaneously observing multiple team members. One reported advantage of the use of the diary in research was its suitability as a substitute for lengthy researcher observation, because multiple data sets could be collected at once (Lewis et al, 2005; Marelli, 2007).
Resumo:
A substantial body of literature exists identifying factors contributing to under-performing Enterprise Resource Planning systems (ERPs), including poor communication, lack of executive support and user dissatisfaction (Calisir et al., 2009). Of particular interest is Momoh et al.’s (2010) recent review identifying poor data quality (DQ) as one of nine critical factors associated with ERP failure. DQ is central to ERP operating processes, ERP facilitated decision-making and inter-organizational cooperation (Batini et al., 2009). Crucial in ERP contexts is that the integrated, automated, process driven nature of ERP data flows can amplify DQ issues, compounding minor errors as they flow through the system (Haug et al., 2009; Xu et al., 2002). However, the growing appreciation of the importance of DQ in determining ERP success lacks research addressing the relationship between stakeholders’ requirements and perceptions of ERP DQ, perceived data utility and the impact of users’ treatment of data on ERP outcomes.
Resumo:
This study determined the rate and indication for revision between cemented, uncemented, hybrid and resurfacing groups from NJR (6 th edition) data. Data validity was determined by interrogating for episodes of misclassification. We identified 6,034 (2.7%) misclassified episodes, containing 97 (4.3%) revisions. Kaplan-Meier revision rates at 3 years were 0.9% cemented, 1.9% for uncemented, 1.2% for hybrids and 3.0% for resurfacings (significant difference across all groups, p<0.001, with identical pattern in patients <55 years). Regression analysis indicated both prosthesis group and age significantly influenced failure (p<0.001). Revision for pain, aseptic loosening, and malalignment were highest in uncemented and resurfacing arthroplasty. Revision for dislocation was highest in uncemented hips (significant difference between groups, p<0.001). Feedback to the NJR on data misclassification has been made for future analysis. © 2012 Wichtig Editore.
Resumo:
Secrecy of decryption keys is an important pre-requisite for security of any encryption scheme and compromised private keys must be immediately replaced. \emph{Forward Security (FS)}, introduced to Public Key Encryption (PKE) by Canetti, Halevi, and Katz (Eurocrypt 2003), reduces damage from compromised keys by guaranteeing confidentiality of messages that were encrypted prior to the compromise event. The FS property was also shown to be achievable in (Hierarchical) Identity-Based Encryption (HIBE) by Yao, Fazio, Dodis, and Lysyanskaya (ACM CCS 2004). Yet, for emerging encryption techniques, offering flexible access control to encrypted data, by means of functional relationships between ciphertexts and decryption keys, FS protection was not known to exist.\smallskip In this paper we introduce FS to the powerful setting of \emph{Hierarchical Predicate Encryption (HPE)}, proposed by Okamoto and Takashima (Asiacrypt 2009). Anticipated applications of FS-HPE schemes can be found in searchable encryption and in fully private communication. Considering the dependencies amongst the concepts, our FS-HPE scheme implies forward-secure flavors of Predicate Encryption and (Hierarchical) Attribute-Based Encryption.\smallskip Our FS-HPE scheme guarantees forward security for plaintexts and for attributes that are hidden in HPE ciphertexts. It further allows delegation of decrypting abilities at any point in time, independent of FS time evolution. It realizes zero-inner-product predicates and is proven adaptively secure under standard assumptions. As the ``cross-product" approach taken in FS-HIBE is not directly applicable to the HPE setting, our construction resorts to techniques that are specific to existing HPE schemes and extends them with what can be seen as a reminiscent of binary tree encryption from FS-PKE.
Resumo:
Background Bactrocera dorsalis s.s. is a pestiferous tephritid fruit fly distributed from Pakistan to the Pacific, with the Thai/Malay peninsula its southern limit. Sister pest taxa, B. papayae and B. philippinensis, occur in the southeast Asian archipelago and the Philippines, respectively. The relationship among these species is unclear due to their high molecular and morphological similarity. This study analysed population structure of these three species within a southeast Asian biogeographical context to assess potential dispersal patterns and the validity of their current taxonomic status. Results Geometric morphometric results generated from 15 landmarks for wings of 169 flies revealed significant differences in wing shape between almost all sites following canonical variate analysis. For the combined data set there was a greater isolation-by-distance (IBD) effect under a ‘non-Euclidean’ scenario which used geographical distances within a biogeographical ‘Sundaland context’ (r2 = 0.772, P < 0.0001) as compared to a ‘Euclidean’ scenario for which direct geographic distances between sample sites was used (r2 = 0.217, P < 0.01). COI sequence data were obtained for 156 individuals and yielded 83 unique haplotypes with no correlation to current taxonomic designations via a minimum spanning network. BEAST analysis provided a root age and location of 540kya in northern Thailand, with migration of B. dorsalis s.l. into Malaysia 470kya and Sumatra 270kya. Two migration events into the Philippines are inferred. Sequence data revealed a weak but significant IBD effect under the ‘non-Euclidean’ scenario (r2 = 0.110, P < 0.05), with no historical migration evident between Taiwan and the Philippines. Results are consistent with those expected at the intra-specific level. Conclusions Bactrocera dorsalis s.s., B. papayae and B. philippinensis likely represent one species structured around the South China Sea, having migrated from northern Thailand into the southeast Asian archipelago and across into the Philippines. No migration is apparent between the Philippines and Taiwan. This information has implications for quarantine, trade and pest management.
Resumo:
This article presents a methodology that integrates cumulative plots with probe vehicle data for estimation of travel time statistics (average, quartile) on urban networks. The integration reduces relative deviation among the cumulative plots so that the classical analytical procedure of defining the area between the plots as the total travel time can be applied. For quartile estimation, a slicing technique is proposed. The methodology is validated with real data from Lucerne, Switzerland and it is concluded that the travel time estimates from the proposed methodology are statistically equivalent to the observed values.
Resumo:
This study proceeds from a central interest in the importance of systematically evaluating operational large-scale integrated information systems (IS) in organisations. The study is conducted within the IS-Impact Research Track at Queensland University of Technology (QUT). The goal of the IS-Impact Track is, "to develop the most widely employed model for benchmarking information systems in organizations for the joint benefit of both research and practice" (Gable et al, 2009). The track espouses programmatic research having the principles of incrementalism, tenacity, holism and generalisability through replication and extension research strategies. Track efforts have yielded the bicameral IS-Impact measurement model; the ‘impact’ half includes Organisational-Impact and Individual-Impact dimensions; the ‘quality’ half includes System-Quality and Information-Quality dimensions. Akin to Gregor’s (2006) analytic theory, the ISImpact model is conceptualised as a formative, multidimensional index and is defined as "a measure at a point in time, of the stream of net benefits from the IS, to date and anticipated, as perceived by all key-user-groups" (Gable et al., 2008, p: 381). The study adopts the IS-Impact model (Gable, et al., 2008) as its core theory base. Prior work within the IS-Impact track has been consciously constrained to Financial IS for their homogeneity. This study adopts a context-extension strategy (Berthon et al., 2002) with the aim "to further validate and extend the IS-Impact measurement model in a new context - i.e. a different IS - Human Resources (HR)". The overarching research question is: "How can the impacts of large-scale integrated HR applications be effectively and efficiently benchmarked?" This managerial question (Cooper & Emory, 1995) decomposes into two more specific research questions – In the new HR context: (RQ1): "Is the IS-Impact model complete?" (RQ2): "Is the ISImpact model valid as a 1st-order formative, 2nd-order formative multidimensional construct?" The study adhered to the two-phase approach of Gable et al. (2008) to hypothesise and validate a measurement model. The initial ‘exploratory phase’ employed a zero base qualitative approach to re-instantiating the IS-Impact model in the HR context. The subsequent ‘confirmatory phase’ sought to validate the resultant hypothesised measurement model against newly gathered quantitative data. The unit of analysis for the study is the application, ‘ALESCO’, an integrated large-scale HR application implemented at Queensland University of Technology (QUT), a large Australian university (with approximately 40,000 students and 5000 staff). Target respondents of both study phases were ALESCO key-user-groups: strategic users, management users, operational users and technical users, who directly use ALESCO or its outputs. An open-ended, qualitative survey was employed in the exploratory phase, with the objective of exploring the completeness and applicability of the IS-Impact model’s dimensions and measures in the new context, and to conceptualise any resultant model changes to be operationalised in the confirmatory phase. Responses from 134 ALESCO users to the main survey question, "What do you consider have been the impacts of the ALESCO (HR) system in your division/department since its implementation?" were decomposed into 425 ‘impact citations.’ Citation mapping using a deductive (top-down) content analysis approach instantiated all dimensions and measures of the IS-Impact model, evidencing its content validity in the new context. Seeking to probe additional (perhaps negative) impacts; the survey included the additional open question "In your opinion, what can be done better to improve the ALESCO (HR) system?" Responses to this question decomposed into a further 107 citations which in the main did not map to IS-Impact, but rather coalesced around the concept of IS-Support. Deductively drawing from relevant literature, and working inductively from the unmapped citations, the new ‘IS-Support’ construct, including the four formative dimensions (i) training, (ii) documentation, (iii) assistance, and (iv) authorisation (each having reflective measures), was defined as: "a measure at a point in time, of the support, the [HR] information system key-user groups receive to increase their capabilities in utilising the system." Thus, a further goal of the study became validation of the IS-Support construct, suggesting the research question (RQ3): "Is IS-Support valid as a 1st-order reflective, 2nd-order formative multidimensional construct?" With the aim of validating IS-Impact within its nomological net (identification through structural relations), as in prior work, Satisfaction was hypothesised as its immediate consequence. The IS-Support construct having derived from a question intended to probe IS-Impacts, too was hypothesised as antecedent to Satisfaction, thereby suggesting the research question (RQ4): "What is the relative contribution of IS-Impact and IS-Support to Satisfaction?" With the goal of testing the above research questions, IS-Impact, IS-Support and Satisfaction were operationalised in a quantitative survey instrument. Partial least squares (PLS) structural equation modelling employing 221 valid responses largely evidenced the validity of the commencing IS-Impact model in the HR context. ISSupport too was validated as operationalised (including 11 reflective measures of its 4 formative dimensions). IS-Support alone explained 36% of Satisfaction; IS-Impact alone 70%; in combination both explaining 71% with virtually all influence of ISSupport subsumed by IS-Impact. Key study contributions to research include: (1) validation of IS-Impact in the HR context, (2) validation of a newly conceptualised IS-Support construct as important antecedent of Satisfaction, and (3) validation of the redundancy of IS-Support when gauging IS-Impact. The study also makes valuable contributions to practice, the research track and the sponsoring organisation.
Resumo:
Structural health monitoring (SHM) refers to the procedure used to assess the condition of structures so that their performance can be monitored and any damage can be detected early. Early detection of damage and appropriate retrofitting will aid in preventing failure of the structure and save money spent on maintenance or replacement and ensure the structure operates safely and efficiently during its whole intended life. Though visual inspection and other techniques such as vibration based ones are available for SHM of structures such as bridges, the use of acoustic emission (AE) technique is an attractive option and is increasing in use. AE waves are high frequency stress waves generated by rapid release of energy from localised sources within a material, such as crack initiation and growth. AE technique involves recording these waves by means of sensors attached on the surface and then analysing the signals to extract information about the nature of the source. High sensitivity to crack growth, ability to locate source, passive nature (no need to supply energy from outside, but energy from damage source itself is utilised) and possibility to perform real time monitoring (detecting crack as it occurs or grows) are some of the attractive features of AE technique. In spite of these advantages, challenges still exist in using AE technique for monitoring applications, especially in the area of analysis of recorded AE data, as large volumes of data are usually generated during monitoring. The need for effective data analysis can be linked with three main aims of monitoring: (a) accurately locating the source of damage; (b) identifying and discriminating signals from different sources of acoustic emission and (c) quantifying the level of damage of AE source for severity assessment. In AE technique, the location of the emission source is usually calculated using the times of arrival and velocities of the AE signals recorded by a number of sensors. But complications arise as AE waves can travel in a structure in a number of different modes that have different velocities and frequencies. Hence, to accurately locate a source it is necessary to identify the modes recorded by the sensors. This study has proposed and tested the use of time-frequency analysis tools such as short time Fourier transform to identify the modes and the use of the velocities of these modes to achieve very accurate results. Further, this study has explored the possibility of reducing the number of sensors needed for data capture by using the velocities of modes captured by a single sensor for source localization. A major problem in practical use of AE technique is the presence of sources of AE other than crack related, such as rubbing and impacts between different components of a structure. These spurious AE signals often mask the signals from the crack activity; hence discrimination of signals to identify the sources is very important. This work developed a model that uses different signal processing tools such as cross-correlation, magnitude squared coherence and energy distribution in different frequency bands as well as modal analysis (comparing amplitudes of identified modes) for accurately differentiating signals from different simulated AE sources. Quantification tools to assess the severity of the damage sources are highly desirable in practical applications. Though different damage quantification methods have been proposed in AE technique, not all have achieved universal approval or have been approved as suitable for all situations. The b-value analysis, which involves the study of distribution of amplitudes of AE signals, and its modified form (known as improved b-value analysis), was investigated for suitability for damage quantification purposes in ductile materials such as steel. This was found to give encouraging results for analysis of data from laboratory, thereby extending the possibility of its use for real life structures. By addressing these primary issues, it is believed that this thesis has helped improve the effectiveness of AE technique for structural health monitoring of civil infrastructures such as bridges.
Resumo:
The health effects of environmental hazards are often examined using time series of the association between a daily response variable (e.g., death) and a daily level of exposure (e.g., temperature). Exposures are usually the average from a network of stations. This gives each station equal importance, and negates the opportunity for some stations to be better measures of exposure. We used a Bayesian hierarchical model that weighted stations using random variables between zero and one. We compared the weighted estimates to the standard model using data on health outcomes (deaths and hospital admissions) and exposures (air pollution and temperature) in Brisbane, Australia. The improvements in model fit were relatively small, and the estimated health effects of pollution were similar using either the standard or weighted estimates. Spatial weighted exposures would be probably more worthwhile when there is either greater spatial detail in the health outcome, or a greater spatial variation in exposure.
Resumo:
Many common diseases, such as the flu and cardiovascular disease, increase markedly in winter and dip in summer. These seasonal patterns have been part of life for millennia and were first noted in ancient Greece by both Hippocrates and Herodotus. Recent interest has focused on climate change, and the concern that seasons will become more extreme with harsher winter and summer weather. We describe a set of R functions designed to model seasonal patterns in disease. We illustrate some simple descriptive and graphical methods, a more complex method that is able to model non-stationary patterns, and the case–crossover for controlling for seasonal confounding.
Resumo:
Advances in information and communication technologies have brought about an information revolution, leading to fundamental changes in the way that information is collected or generated, shared and distributed. The importance of establishing systems in which research findings can be readily made available to and used by other researchers has long been recognized in international scientific collaborations. If the data access principles adopted by international scientific collaborations are to be effectively implemented they must be supported by the national policies and laws in place in the countries in which participating researchers are operating.
Resumo:
While undertaking the ANDS RDA Gold Standard Record Exemplars project, research data sharing was discussed with many QUT researchers. Our experiences provided rich insight into researcher attitudes towards their data and the sharing of such data. Generally, we found traditional altruistic motivations for research data sharing did not inspire researchers, but an explanation of the more achievement-oriented benefits were more compelling.