Biblioteca Digital

917 resultados para structuration of lexical data bases

Secure and Scalable Statistical Computation of Questionnaire Data in R

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Collecting data via a questionnaire and analyzing them while preserving respondents’ privacy may increase the number of respondents and the truthfulness of their responses. It may also reduce the systematic differences between respondents and non-respondents. In this paper, we propose a privacy-preserving method for collecting and analyzing survey responses using secure multi-party computation (SMC). The method is secure under the semi-honest adversarial model. The proposed method computes a wide variety of statistics. Total and stratified statistical counts are computed using the secure protocols developed in this paper. Then, additional statistics, such as a contingency table, a chi-square test, an odds ratio, and logistic regression, are computed within the R statistical environment using the statistical counts as building blocks. The method was evaluated on a questionnaire dataset of 3,158 respondents sampled for a medical study and simulated questionnaire datasets of up to 50,000 respondents. The computation time for the statistical analyses linearly scales as the number of respondents increases. The results show that the method is efficient and scalable for practical use. It can also be used for other applications in which categorical data are collected.

The Process of Analyzing Data is the Emergent Feature of Data Science

Relevância:

100.00% 100.00%

Publicador:

Ten Questions for Future Regulation of Big Data: A Comparative and Empirical Legal Study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Much has been written about Big Data from a technical, economical, juridical and ethical perspective. Still, very little empirical and comparative data is available on how Big Data is approached and regulated in Europe and beyond. This contribution makes a first effort to fill that gap by presenting the reactions to a survey on Big Data from the Data Protection Authorities of fourteen European countries and a comparative legal research of eleven countries. This contribution presents those results, addressing 10 challenges for the regulation of Big Data.

An Exploration of the Data Collection Methods Utilised with Children, Teenagers and Young People (CTYPs)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: The impact of cancer upon children, teenagers and young people can be profound. Research has been undertaken to explore the impacts upon children, teenagers and young people with cancer, but little is known about how researchers can ‘best’ engage with this group to explore their experiences. This review paper provides an overview of the utility of data collection methods employed when undertaking research with children, teenagers and young people. A systematic review of relevant databases was undertaken utilising the search terms ‘young people’, ‘young adult’, ‘adolescent’ and ‘data collection methods’. The full-text of the papers that were deemed eligible from the title and abstract were accessed and following discussion within the research team, thirty papers were included. Findings: Due to the heterogeneity in terms of the scope of the papers identified the following data collections methods were included in the results section. Three of the papers identified provided an overview of data collection methods utilised with this population and the remaining twenty seven papers covered the following data collection methods: Digital technologies; art based research; comparing the use of ‘paper and pencil’ research with web-based technologies, the use of games; the use of a specific communication tool; questionnaires and interviews; focus groups and telephone interviews/questionnaires. The strengths and limitations of the range of data collection methods included are discussed drawing upon such issues as of the appropriateness of particular methods for particular age groups, or the most appropriate method to employ when exploring a particularly sensitive topic area. Conclusions: There are a number of data collection methods utilised to undertaken research with children, teenagers and young adults. This review provides a summary of the current available evidence and an overview of the strengths and limitations of data collection methods employed.

Legal requirements on secondary use of medical data in the EU and USA – A case study

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The use of secondary data in health care research has become a very important issue over the past few years. Data from the treatment context are being used for evaluation of medical data for external quality assurance, as well as to answer medical questions in the form of registers and research databases. Additionally, the establishment of electronic clinical systems like data warehouses provides new opportunities for the secondary use of clinical data. Because health data is among the most sensitive information about an individual, the data must be safeguarded from disclosure.

Exploring the feasibility of applying data mining for library reference service improvement : a case study of Turku Main Library

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Data mining, as a heatedly discussed term, has been studied in various fields. Its possibilities in refining the decision-making process, realizing potential patterns and creating valuable knowledge have won attention of scholars and practitioners. However, there are less studies intending to combine data mining and libraries where data generation occurs all the time. Therefore, this thesis plans to fill such a gap. Meanwhile, potential opportunities created by data mining are explored to enhance one of the most important elements of libraries: reference service. In order to thoroughly demonstrate the feasibility and applicability of data mining, literature is reviewed to establish a critical understanding of data mining in libraries and attain the current status of library reference service. The result of the literature review indicates that free online data resources other than data generated on social media are rarely considered to be applied in current library data mining mandates. Therefore, the result of the literature review motivates the presented study to utilize online free resources. Furthermore, the natural match between data mining and libraries is established. The natural match is explained by emphasizing the data richness reality and considering data mining as one kind of knowledge, an easy choice for libraries, and a wise method to overcome reference service challenges. The natural match, especially the aspect that data mining could be helpful for library reference service, lays the main theoretical foundation for the empirical work in this study. Turku Main Library was selected as the case to answer the research question: whether data mining is feasible and applicable for reference service improvement. In this case, the daily visit from 2009 to 2015 in Turku Main Library is considered as the resource for data mining. In addition, corresponding weather conditions are collected from Weather Underground, which is totally free online. Before officially being analyzed, the collected dataset is cleansed and preprocessed in order to ensure the quality of data mining. Multiple regression analysis is employed to mine the final dataset. Hourly visits are the independent variable and weather conditions, Discomfort Index and seven days in a week are dependent variables. In the end, four models in different seasons are established to predict visiting situations in each season. Patterns are realized in different seasons and implications are created based on the discovered patterns. In addition, library-climate points are generated by a clustering method, which simplifies the process for librarians using weather data to forecast library visiting situation. Then the data mining result is interpreted from the perspective of improving reference service. After this data mining work, the result of the case study is presented to librarians so as to collect professional opinions regarding the possibility of employing data mining to improve reference services. In the end, positive opinions are collected, which implies that it is feasible to utilizing data mining as a tool to enhance library reference service.

Toward a manifesto for the ‘public understanding of big data’

Relevância:

100.00% 100.00%

Publicador:

Consequences of big data on the individual: Interview

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The speed with which data has moved from being scarce, expensive and valuable, thus justifying detailed and careful verification and analysis to a situation where the streams of detailed data are almost too large to handle has caused a series of shifts to occur. Legal systems already have severe problems keeping up with, or even in touch with, the rate at which unexpected outcomes flow from information technology. The capacity to harness massive quantities of existing data has driven Big Data applications until recently. Now the data flows in real time are rising swiftly, become more invasive and offer monitoring potential that is eagerly sought by commerce and government alike. The ambiguities as to who own this often quite remarkably intrusive personal data need to be resolved – and rapidly - but are likely to encounter rising resistance from industrial and commercial bodies who see this data flow as ‘theirs’. There have been many changes in ICT that has led to stresses in the resolution of the conflicts between IP exploiters and their customers, but this one is of a different scale due to the wide potential for individual customisation of pricing, identification and the rising commercial value of integrated streams of diverse personal data. A new reconciliation between the parties involved is needed. New business models, and a shift in the current confusions over who owns what data into alignments that are in better accord with the community expectations. After all they are the customers, and the emergence of information monopolies needs to be balanced by appropriate consumer/subject rights. This will be a difficult discussion, but one that is needed to realise the great benefits to all that are clearly available if these issues can be positively resolved. The customers need to make these data flow contestable in some form. These Big data flows are only going to grow and become ever more instructive. A better balance is necessary, For the first time these changes are directly affecting governance of democracies, as the very effective micro targeting tools deployed in recent elections have shown. Yet the data gathered is not available to the subjects. This is not a survivable social model. The Private Data Commons needs our help. Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons. This Web extra is the audio part of a video in which author Marcus Wigan expands on his article "Big Data's Big Unintended Consequences" and discusses how businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons.

Stroke severity, early recovery and outcome are each related with clinical classification of stroke: Data from the 'Tinzaparin in Acute Ischaemic Stroke Trial (TAIST)

Relevância:

100.00% 100.00%

Publicador:

Combining natural microstructures with composite flow laws: an improved approach for the extrapolation of lab data to nature

Relevância:

100.00% 100.00%

Publicador:

Oil spill environmental risk assessment of stationary sources in offshore zones (Innovating a mathematical model)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Nowadays, risks arising from the rapid development of oil and gas industries are significantly increasing. As a result, one of the main concerns of either industrial or environmental managers is the identification and assessment of such risks in order to develop and maintain appropriate proactive measures. Oil spill from stationary sources in offshore zones is one of the accidents resulting in several adverse impacts on marine ecosystems. Considering a site's current situation and relevant requirements and standards, risk assessment process is not only capable of recognizing the probable causes of accidents but also of estimating the probability of occurrence and the severity of consequences. In this way, results of risk assessment would help managers and decision makers create and employ proper control methods. Most of the represented models for risk assessment of oil spills are achieved on the basis of accurate data bases and analysis of historical data, but unfortunately such data bases are not accessible in most of the zones, especially in developing countries, or else they are newly established and not applicable yet. This issue reveals the necessity of using Expert Systems and Fuzzy Set Theory. By using such systems it will be possible to formulize the specialty and experience of several experts and specialists who have been working in petroliferous areas for several years. On the other hand, in developing countries often the damages to environment and environmental resources are not considered as risk assessment priorities and they are approximately under-estimated. For this reason, the proposed model in this research is specially addressing the environmental risk of oil spills from stationary sources in offshore zones.

Assessment of a data-limited, multi-species shark fishery in the Great Barrier Reef Marine Park and south-east Queensland

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The status of five species of commercially exploited sharks within the Great Barrier Reef Marine Park (GBRMP) and south-east Queensland was assessed using a data-limited approach. Annual harvest rate, U, estimated empirically from tagging between 2011 and 2013, was compared with an analytically-derived proxy for optimal equilibrium harvest rate, UMSY Lim. Median estimates of U for three principal retained species, Australian blacktip shark, Carcharhinus tilstoni, spot-tail shark, Carcharhinus sorrah, and spinner shark, Carcharhinus brevipinna, were 0.10, 0.06 and 0.07 year-1, respectively. Median U for two retained, non-target species, pigeye shark, Carcharhinus amboinensis and Australian sharpnose shark, Rhizoprionodon taylori, were 0.27 and 0.01 year-1, respectively. For all species except the Australian blacktip the median ratio of U/UMSY Lim was <1. The high vulnerability of this species to fishing combined with life history characteristics meant UMSY Lim was low (0.04-0.07 year-1) and that U/UMSY Lim was likely to be > 1. Harvest of the Australian blacktip shark above UMSY could place this species at a greater risk of localised depletion in parts of the GBRMP. Results of the study indicated that much higher catches, and presumably higher U, during the early 2000s were likely unsustainable. The unexpectedly high level of U on the pigeye shark indicated that output-based management controls may not have been effective in reducing harvest levels on all species, particularly those caught incidentally by other fishing sectors including the recreational sector. © 2016 Elsevier B.V.

Does time ever fly or slow down? The difficult interpretation of psychophysical data on time perception.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Time perception is studied with subjective or semi-objective psychophysical methods. With subjective methods, observers provide quantitative estimates of duration and data depict the psychophysical function relating subjective duration to objective duration. With semi-objective methods, observers provide categorical or comparative judgments of duration and data depict the psychometric function relating the probability of a certain judgment to objective duration. Both approaches are used to study whether subjective and objective time run at the same pace or whether time flies or slows down under certain conditions. We analyze theoretical aspects affecting the interpretation of data gathered with the most widely used semi-objective methods, including single-presentation and paired-comparison methods. For this purpose, a formal model of psychophysical performance is used in which subjective duration is represented via a psychophysical function and the scalar property. This provides the timing component of the model, which is invariant across methods. A decisional component that varies across methods reflects how observers use subjective durations to make judgments and give the responses requested under each method. Application of the model shows that psychometric functions in single-presentation methods are uninterpretable because the various influences on observed performance are inextricably confounded in the data. In contrast, data gathered with paired-comparison methods permit separating out those influences. Prevalent approaches to fitting psychometric functions to data are also discussed and shown to be inconsistent with widely accepted principles of time perception, implicitly assuming instead that subjective time equals objective time and that observed differences across conditions do not reflect differences in perceived duration but criterion shifts. These analyses prompt evidence-based recommendations for best methodological practice in studies on time perception.

Stroke severity, early recovery and outcome are each related with clinical classification of stroke: data from the 'Tinzaparin in Acute Ischaemic Stroke Trial (TAIST)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Introduction: Baseline severity and clinical stroke syndrome (Oxford Community Stroke Project, OCSP) classification are predictors of outcome in stroke. We used data from the ‘Tinzaparin in Acute Ischaemic Stroke Trial’ (TAIST) to assess the relationship between stroke severity, early recovery, outcome and OCSP syndrome. Methods: TAIST was a randomised controlled trial assessing the safety and efficacy of tinzaparin versus aspirin in 1,484 patients with acute ischaemic stroke. Severity was measured as the Scandinavian Neurological Stroke Scale (SNSS) at baseline and days 4, 7 and 10, and baseline OCSP clinical classification recorded: total anterior circulation infarct (TACI), partial anterior circulation infarct (PACI), lacunar infarct (LACI) and posterior circulation infarction (POCI). Recovery was calculated as change in SNSS from baseline at day 4 and 10. The relationship between stroke syndrome and SNSS at days 4 and 10, and outcome (modified Rankin scale at 90 days) were assessed. Results: Stroke severity was significantly different between TACI (most severe) and LACI (mildest) at all four time points (p<0.001), with no difference between PACI and POCI. The largest change in SNSS score occurred between baseline and day 4; improvement was least in TACI (median 2 units), compared to other groups (median 3 units) (p<0.001). If SNSS did not improve by day 4, then early recovery and late functional outcome tended to be limited irrespective of clinical syndrome (SNSS, baseline: 31, day 10: 32; mRS, day 90: 4); patients who recovered early tended to continue to improve and had better functional outcome irrespective of syndrome (SNSS, baseline: 35, day 10: 50; mRS, day 90: 2). Conclusions: Although functional outcome is related to baseline clinical syndrome (best with LACI, worst with TACI), patients who improve early have a more favourable functional outcome, irrespective of their OCSP syndrome. Hence, patients with a TACI syndrome may still achieve a reasonable outcome if early recovery occurs.

EMMA 2--a MAGE-compliant system for the collaborative analysis and integration of microarray data

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Understanding transcriptional regulation by genome-wide microarray studies can contribute to unravel complex relationships between genes. Attempts to standardize the annotation of microarray data include the Minimum Information About a Microarray Experiment (MIAME) recommendations, the MAGE-ML format for data interchange, and the use of controlled vocabularies or ontologies. The existing software systems for microarray data analysis implement the mentioned standards only partially and are often hard to use and extend. Integration of genomic annotation data and other sources of external knowledge using open standards is therefore a key requirement for future integrated analysis systems. Results: The EMMA 2 software has been designed to resolve shortcomings with respect to full MAGE-ML and ontology support and makes use of modern data integration techniques. We present a software system that features comprehensive data analysis functions for spotted arrays, and for the most common synthesized oligo arrays such as Agilent, Affymetrix and NimbleGen. The system is based on the full MAGE object model. Analysis functionality is based on R and Bioconductor packages and can make use of a compute cluster for distributed services. Conclusion: Our model-driven approach for automatically implementing a full MAGE object model provides high flexibility and compatibility. Data integration via SOAP-based web-services is advantageous in a distributed client-server environment as the collaborative analysis of microarray data is gaining more and more relevance in international research consortia. The adequacy of the EMMA 2 software design and implementation has been proven by its application in many distributed functional genomics projects. Its scalability makes the current architecture suited for extensions towards future transcriptomics methods based on high-throughput sequencing approaches which have much higher computational requirements than microarrays.

«
1
2
...
42
43
44
45
46
47
48
...
61
62
»