980 resultados para data linkage
Resumo:
Recent studies have linked the ability of novice (CS1) programmers to read and explain code with their ability to write code. This study extends earlier work by asking CS2 students to explain object-oriented data structures problems that involve recursion. Results show a strong correlation between ability to explain code at an abstract level and performance on code writing and code reading test problems for these object-oriented data structures problems. The authors postulate that there is a common set of skills concerned with reasoning about programs that explains the correlation between writing code and explaining code. The authors suggest that an overly exclusive emphasis on code writing may be detrimental to learning to program. Non-code writing learning activities (e.g., reading and explaining code) are likely to improve student ability to reason about code and, by extension, improve student ability to write code. A judicious mix of code-writing and code-reading activities is recommended.
Resumo:
BACKGROUND Experimental and epidemiologic evidence have suggested that chronic inflammation may play a critical role in endometrial carcinogenesis. METHODS To investigate this hypothesis, a two-stage study was carried out to evaluate single-nucleotide polymorphisms (SNP) in inflammatory pathway genes in association with endometrial cancer risk. In stage I, 64 candidate pathway genes were identified and 4,542 directly genotyped or imputed SNPs were analyzed among 832 endometrial cancer cases and 2,049 controls, using data from the Shanghai Endometrial Cancer Genetics Study. Linkage disequilibrium of stage I SNPs significantly associated with endometrial cancer (P < 0.05) indicated that the majority of associations could be linked to one of 24 distinct loci. One SNP from each of the 24 loci was then selected for follow-up genotyping. Of these, 21 SNPs were successfully designed and genotyped in stage II, which consisted of 10 additional studies including 6,604 endometrial cancer cases and 8,511 controls. RESULTS Five of the 21 SNPs had significant allelic odds ratios (ORs) and 95% confidence intervals (CI) as follows: FABP1, 0.92 (0.85-0.99); CXCL3, 1.16 (1.05-1.29); IL6, 1.08 (1.00-1.17); MSR1, 0.90 (0.82-0.98); and MMP9, 0.91 (0.87-0.97). Two of these polymorphisms were independently significant in the replication sample (rs352038 in CXCL3 and rs3918249 in MMP9). The association for the MMP9 polymorphism remained significant after Bonferroni correction and showed a significant association with endometrial cancer in both Asian- and European-ancestry samples. CONCLUSIONS These findings lend support to the hypothesis that genetic polymorphisms in genes involved in the inflammatory pathway may contribute to genetic susceptibility to endometrial cancer. Impact statement: This study adds to the growing evidence that inflammation plays an important role in endometrial carcinogenesis.
Resumo:
Road networks are a national critical infrastructure. The road assets need to be monitored and maintained efficiently as their conditions deteriorate over time. The condition of one of such assets, road pavement, plays a major role in the road network maintenance programmes. Pavement conditions depend upon many factors such as pavement types, traffic and environmental conditions. This paper presents a data analytics case study for assessing the factors affecting the pavement deflection values measured by the traffic speed deflectometer (TSD) device. The analytics process includes acquisition and integration of data from multiple sources, data pre-processing, mining useful information from them and utilising data mining outputs for knowledge deployment. Data mining techniques are able to show how TSD outputs vary in different roads, traffic and environmental conditions. The generated data mining models map the TSD outputs to some classes and define correction factors for each class.
Resumo:
The quality of data collection methods selected and the integrity of the data collected are integral tot eh success of a study. This chapter focuses on data collection and study validity. After reading the chapter, readers should be able to define types of data collection methods in quantitative research; list advantages and disadvantages of each method; discuss factors related to internal and external validity; critically evaluate data collection methods and discuss the need to operationalise variables of interest for data collection.
Resumo:
A spatial process observed over a lattice or a set of irregular regions is usually modeled using a conditionally autoregressive (CAR) model. The neighborhoods within a CAR model are generally formed deterministically using the inter-distances or boundaries between the regions. An extension of CAR model is proposed in this article where the selection of the neighborhood depends on unknown parameter(s). This extension is called a Stochastic Neighborhood CAR (SNCAR) model. The resulting model shows flexibility in accurately estimating covariance structures for data generated from a variety of spatial covariance models. Specific examples are illustrated using data generated from some common spatial covariance functions as well as real data concerning radioactive contamination of the soil in Switzerland after the Chernobyl accident.
Resumo:
Environmental monitoring is becoming critical as human activity and climate change place greater pressures on biodiversity, leading to an increasing need for data to make informed decisions. Acoustic sensors can help collect data across large areas for extended periods making them attractive in environmental monitoring. However, managing and analysing large volumes of environmental acoustic data is a great challenge and is consequently hindering the effective utilization of the big dataset collected. This paper presents an overview of our current techniques for collecting, storing and analysing large volumes of acoustic data efficiently, accurately, and cost-effectively.
Resumo:
Recently, we defined a new syndromic form of X-linked mental retardation in a 4-generation family with a unique clinical phenotype characterized by mild mental retardation, choreoathetosis, and abnormal behavior (MRXS10). Linkage analysis in this family revealed a candidate region of 13.4 Mb between markers DXS1201 and DXS991 on Xp11; therefore, mutation analysis was performed by direct sequencing in most of the 135 annotated genes located in the region. The gene (HADH2) encoding L-3-hydroxyacyl-CoA dehydrogenase II displayed a sequence alteration (c.574 C-->A; p.R192R) in all patients and carrier females that was absent in unaffected male family members and could not be found in 2,500 control X chromosomes, including in those of 500 healthy males. The silent C-->A substitution is located in exon 5 and was shown by western blot to reduce the amount of HADH2 protein by 60%-70% in the patient. Quantitative in vivo and in vitro expression studies revealed a ratio of splicing transcript amounts different from those normally seen in controls. Apparently, the reduced expression of the wild-type fragment, which results in the decreased protein expression, rather than the increased amount of aberrant splicing fragments of the HADH2 gene, is pathogenic. Our data therefore strongly suggest that reduced expression of the HADH2 protein causes MRXS10, a phenotype different from that caused by 2-methyl-3-hydroxybutyryl-CoA dehydrogenase deficiency, which is a neurodegenerative disorder caused by missense mutations in this multifunctional protein.
Resumo:
This chapter addresses data modelling as a means of promoting statistical literacy in the early grades. Consideration is first given to the importance of increasing young children’s exposure to statistical reasoning experiences and how data modelling can be a rich means of doing so. Selected components of data modelling are then reviewed, followed by a report on some findings from the third-year of a three-year longitudinal study across grades one through three.
Resumo:
A variety of sustainable development research efforts and related activities are attempting to reconcile the issues of conserving our natural resources without limiting economic motivation while also improving our social equity and quality of life. Land use/land cover change, occurring on a global scale, is an aggregate of local land use decisions and profoundly impacts our environment. It is therefore the local decision making process that should be the eventual target of many of the ongoing data collection and research efforts which strive toward supporting a sustainable future. Satellite imagery data is a primary source of data upon which to build a core data set for use by researchers in analyzing this global change. A process is necessary to link global change research, utilizing satellite imagery, to the local land use decision making process. One example of this is the NASA-sponsored Regional Data Center (RDC) prototype. The RDC approach is an attempt to integrate science and technology at the community level. The anticipated result of this complex interaction between research and the decision making communities will be realized in the form of long-term benefits to the public.
Resumo:
Historically, it appears that some of the WRCF have survived because i) they lack sufficient quantity of commercially valuable species; ii) they are located in remote or inaccessible areas; or iii) they have been protected as national parks and sanctuaries. Forests will be protected when people who are deciding the fate of forests conclude than the conservation of forests is more beneficial, e.g. generates higher incomes or has cultural or social values, than their clearance. If this is not the case, forests will continue to be cleared and converted. In the future, the WRCF may be protected only by focused attention. The future policy options may include strategies for strong protection measures, the raising of public awareness about the value of forests, and concerted actions for reducing pressure on forest lands by providing alternatives to forest exploitation to meet the growing demands of forest products. Many areas with low population densities offer an opportunity for conservation if appropriate steps are taken now by the national governments and international community. This opportunity must be founded upon the increased public and government awareness that forests have vast importance to the welfare of humans and ecosystems' services such as biodiversity, watershed protection, and carbon balance. Also paramount to this opportunity is the increased scientific understanding of forest dynamics and technical capability to install global observation and assessment systems. High-resolution satellite data such as Landsat 7 and other technologically advanced satellite programs will provide unprecedented monitoring options for governing authorities. Technological innovation can contribute to the way forests are protected. The use of satellite imagery for regular monitoring and Internet for information dissemination provide effective tools for raising worldwide awareness about the significance of forests and intrinsic value of nature.
Resumo:
The complex supply chain relations of the construction industry, coupled with the substantial amount of information to be shared on a regular basis between the parties involved, make the traditional paper-based data interchange methods inefficient, error prone and expensive. The successful information technology (IT) applications that enable seamless data interchange, such as the Electronic Data Interchange (EDI) systems, have generally failed to be successfully implemented in the construction industry. An alternative emerging technology, Extensible Markup Language (XML), and its applicability to streamline business processes and to improve data interchange methods within the construction industry are analysed, as is the EDI technology to identify the strategic advantages that XML technology provides to overcome the barriers to implementation. In addition, the successful implementation of XML-based automated data interchange platforms for a large organization, and the proposed benefits thereof, are presented as a case study.
Resumo:
Enterprises, both public and private, have rapidly commenced using the benefits of enterprise resource planning (ERP) combined with business analytics and “open data sets” which are often outside the control of the enterprise to gain further efficiencies, build new service operations and increase business activity. In many cases, these business activities are based around relevant software systems hosted in a “cloud computing” environment. “Garbage in, garbage out”, or “GIGO”, is a term long used to describe problems in unqualified dependency on information systems, dating from the 1960s. However, a more pertinent variation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems, such as ERP and usage of open datasets in a cloud environment, the ability to verify the authenticity of those data sets used may be almost impossible, resulting in dependence upon questionable results. Illicit data set “impersonation” becomes a reality. At the same time the ability to audit such results may be an important requirement, particularly in the public sector. This paper discusses the need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment and analyses some current technologies that are offered and which may be appropriate. However, severe limitations to addressing these requirements have been identified and the paper proposes further research work in the area.
Resumo:
Enterprise resource planning (ERP) systems are rapidly being combined with “big data” analytics processes and publicly available “open data sets”, which are usually outside the arena of the enterprise, to expand activity through better service to current clients as well as identifying new opportunities. Moreover, these activities are now largely based around relevant software systems hosted in a “cloud computing” environment. However, the over 50- year old phrase related to mistrust in computer systems, namely “garbage in, garbage out” or “GIGO”, is used to describe problems of unqualified and unquestioning dependency on information systems. However, a more relevant GIGO interpretation arose sometime later, namely “garbage in, gospel out” signifying that with large scale information systems based around ERP and open datasets as well as “big data” analytics, particularly in a cloud environment, the ability to verify the authenticity and integrity of the data sets used may be almost impossible. In turn, this may easily result in decision making based upon questionable results which are unverifiable. Illicit “impersonation” of and modifications to legitimate data sets may become a reality while at the same time the ability to audit any derived results of analysis may be an important requirement, particularly in the public sector. The pressing need for enhancement of identity, reliability, authenticity and audit services, including naming and addressing services, in this emerging environment is discussed in this paper. Some current and appropriate technologies currently being offered are also examined. However, severe limitations in addressing the problems identified are found and the paper proposes further necessary research work for the area. (Note: This paper is based on an earlier unpublished paper/presentation “Identity, Addressing, Authenticity and Audit Requirements for Trust in ERP, Analytics and Big/Open Data in a ‘Cloud’ Computing Environment: A Review and Proposal” presented to the Department of Accounting and IT, College of Management, National Chung Chen University, 20 November 2013.)
Resumo:
One of the concerns about the use of Bluetooth MAC Scanner (BMS) data, especially from urban arterial, is the bias in the travel time estimates from multiple Bluetooth devices being transported by a vehicle. For instance, if a bus is transporting 20 passengers with Bluetooth equipped mobile phones, then the discovery of these mobile phones by BMS will be considered as 20 different vehicles, and the average travel time along the corridor estimated from the BMS data will be biased with the travel time from the bus. This paper integrates Bus Vehicle Identification system with BMS network to empirically evaluate such bias, if any. The paper also reports an interesting finding on the uniqueness of MAC IDs.
Resumo:
Loop detectors are the oldest and widely used traffic data source. On urban arterials, they are mainly installed for signal control. Recently state of the art Bluetooth MAC Scanners (BMS) has significantly captured the interest of stakeholders for exploiting it for area wide traffic monitoring. Loop detectors provide flow- a fundamental traffic parameter; whereas BMS provides individual vehicle travel time between BMS stations. Hence, these two data sources complement each other, and if integrated should increase the accuracy and reliability of the traffic state estimation. This paper proposed a model that integrates loops and BMS data for seamless travel time and density estimation for urban signalised network. The proposed model is validated using both real and simulated data and the results indicate that the accuracy of the proposed model is over 90%.