32 resultados para Unicode Common Locale Data Repository


Relevância:

40.00% 40.00%

Publicador:

Resumo:

Americans are accustomed to a wide range of data collection in their lives: census, polls, surveys, user registrations, and disclosure forms. When logging onto the Internet, users’ actions are being tracked everywhere: clicking, typing, tapping, swiping, searching, and placing orders. All of this data is stored to create data-driven profiles of each user. Social network sites, furthermore, set the voluntarily sharing of personal data as the default mode of engagement. But people’s time and energy devoted to creating this massive amount of data, on paper and online, are taken for granted. Few people would consider their time and energy spent on data production as labor. Even if some people do acknowledge their labor for data, they believe it is accessory to the activities at hand. In the face of pervasive data collection and the rising time spent on screens, why do people keep ignoring their labor for data? How has labor for data been become invisible, as something that is disregarded by many users? What does invisible labor for data imply for everyday cultural practices in the United States? Invisible Labor for Data addresses these questions. I argue that three intertwined forces contribute to framing data production as being void of labor: data production institutions throughout history, the Internet’s technological infrastructure (especially with the implementation of algorithms), and the multiplication of virtual spaces. There is a common tendency in the framework of human interactions with computers to deprive data and bodies of their materiality. My Introduction and Chapter 1 offer theoretical interventions by reinstating embodied materiality and redefining labor for data as an ongoing process. The middle Chapters present case studies explaining how labor for data is pushed to the margin of the narratives about data production. I focus on a nationwide debate in the 1960s on whether the U.S. should build a databank, contemporary Big Data practices in the data broker and the Internet industries, and the group of people who are hired to produce data for other people’s avatars in the virtual games. I conclude with a discussion on how the new development of crowdsourcing projects may usher in the new chapter in exploiting invisible and discounted labor for data.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Institutions are widely regarded as important, even ultimate drivers of economic growth and performance. A recent mainstream of institutional economics has concentrated on the effect of persisting, often imprecisely measured institutions and on cataclysmic events as agents of noteworthy institutional change. As a consequence, institutional change without large-scale shocks has received little attention. In this dissertation I apply a complementary, quantitative-descriptive approach that relies on measures of actually enforced institutions to study institutional persistence and change over a long time period that is undisturbed by the typically studied cataclysmic events. By placing institutional change into the center of attention one can recognize different speeds of institutional innovation and the continuous coexistence of institutional persistence and change. Specifically, I combine text mining procedures, network analysis techniques and statistical approaches to study persistence and change in England’s common law over the Industrial Revolution (1700-1865). Based on the doctrine of precedent - a peculiarity of common law systems - I construct and analyze the apparently first citation network that reflects lawmaking in England. Most strikingly, I find large-scale change in the making of English common law around the turn of the 19th century - a period free from the typically studied cataclysmic events. Within a few decades a legal innovation process with low depreciation rates (1 to 2 percent) and strong past-persistence transitioned to a present-focused innovation process with significantly higher depreciation rates (4 to 6 percent) and weak past-persistence. Comparison with U.S. Supreme Court data reveals a similar U.S. transition towards the end of the 19th century. The English and U.S. transitions appear to have unfolded in a very specific manner: a new body of law arose during the transitions and developed in a self-referential manner while the existing body of law lost influence, but remained prominent. Additional findings suggest that Parliament doubled its influence on the making of case law within the first decades after the Glorious Revolution and that England’s legal rules manifested a high degree of long-term persistence. The latter allows for the possibility that the often-noted persistence of institutional outcomes derives from the actual persistence of institutions.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In 2005, the University of Maryland acquired over 70 digital videos spanning 35 years of Jim Henson’s groundbreaking work in television and film. To support in-house discovery and use, the collection was cataloged in detail using AACR2 and MARC21, and a web-based finding aid was also created. In the past year, I created an "r-ball" (a linked data set described using RDA) of these same resources. The presentation will compare and contrast these three ways of accessing the Jim Henson Works collection, with insights gleaned from providing resource discovery using RIMMF (RDA in Many Metadata Formats).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Presentation from the MARAC conference in Pittsburgh, PA on April 14–16, 2016. S13 - Student Poster Session; Analysis of Federal Policy on Public Access to Scientific Research Data

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In today’s big data world, data is being produced in massive volumes, at great velocity and from a variety of different sources such as mobile devices, sensors, a plethora of small devices hooked to the internet (Internet of Things), social networks, communication networks and many others. Interactive querying and large-scale analytics are being increasingly used to derive value out of this big data. A large portion of this data is being stored and processed in the Cloud due the several advantages provided by the Cloud such as scalability, elasticity, availability, low cost of ownership and the overall economies of scale. There is thus, a growing need for large-scale cloud-based data management systems that can support real-time ingest, storage and processing of large volumes of heterogeneous data. However, in the pay-as-you-go Cloud environment, the cost of analytics can grow linearly with the time and resources required. Reducing the cost of data analytics in the Cloud thus remains a primary challenge. In my dissertation research, I have focused on building efficient and cost-effective cloud-based data management systems for different application domains that are predominant in cloud computing environments. In the first part of my dissertation, I address the problem of reducing the cost of transactional workloads on relational databases to support database-as-a-service in the Cloud. The primary challenges in supporting such workloads include choosing how to partition the data across a large number of machines, minimizing the number of distributed transactions, providing high data availability, and tolerating failures gracefully. I have designed, built and evaluated SWORD, an end-to-end scalable online transaction processing system, that utilizes workload-aware data placement and replication to minimize the number of distributed transactions that incorporates a suite of novel techniques to significantly reduce the overheads incurred both during the initial placement of data, and during query execution at runtime. In the second part of my dissertation, I focus on sampling-based progressive analytics as a means to reduce the cost of data analytics in the relational domain. Sampling has been traditionally used by data scientists to get progressive answers to complex analytical tasks over large volumes of data. Typically, this involves manually extracting samples of increasing data size (progressive samples) for exploratory querying. This provides the data scientists with user control, repeatable semantics, and result provenance. However, such solutions result in tedious workflows that preclude the reuse of work across samples. On the other hand, existing approximate query processing systems report early results, but do not offer the above benefits for complex ad-hoc queries. I propose a new progressive data-parallel computation framework, NOW!, that provides support for progressive analytics over big data. In particular, NOW! enables progressive relational (SQL) query support in the Cloud using unique progress semantics that allow efficient and deterministic query processing over samples providing meaningful early results and provenance to data scientists. NOW! enables the provision of early results using significantly fewer resources thereby enabling a substantial reduction in the cost incurred during such analytics. Finally, I propose NSCALE, a system for efficient and cost-effective complex analytics on large-scale graph-structured data in the Cloud. The system is based on the key observation that a wide range of complex analysis tasks over graph data require processing and reasoning about a large number of multi-hop neighborhoods or subgraphs in the graph; examples include ego network analysis, motif counting in biological networks, finding social circles in social networks, personalized recommendations, link prediction, etc. These tasks are not well served by existing vertex-centric graph processing frameworks whose computation and execution models limit the user program to directly access the state of a single vertex, resulting in high execution overheads. Further, the lack of support for extracting the relevant portions of the graph that are of interest to an analysis task and loading it onto distributed memory leads to poor scalability. NSCALE allows users to write programs at the level of neighborhoods or subgraphs rather than at the level of vertices, and to declaratively specify the subgraphs of interest. It enables the efficient distributed execution of these neighborhood-centric complex analysis tasks over largescale graphs, while minimizing resource consumption and communication cost, thereby substantially reducing the overall cost of graph data analytics in the Cloud. The results of our extensive experimental evaluation of these prototypes with several real-world data sets and applications validate the effectiveness of our techniques which provide orders-of-magnitude reductions in the overheads of distributed data querying and analysis in the Cloud.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Certain environments can inhibit learning and stifle enthusiasm, while others enhance learning or stimulate curiosity. Furthermore, in a world where technological change is accelerating we could ask how might architecture connect resource abundant and resource scarce innovation environments? Innovation environments developed out of necessity within urban villages and those developed with high intention and expectation within more institutionalized settings share a framework of opportunity for addressing change through learning and education. This thesis investigates formal and informal learning environments and how architecture can stimulate curiosity, enrich learning, create common ground, and expand access to education. The reason for this thesis exploration is to better understand how architects might design inclusive environments that bring people together to build sustainable infrastructure encouraging innovation and adaptation to change for years to come. The context of this thesis is largely based on Colin McFarlane’s theory that the “city is an assemblage for learning” The socio-spatial perspective in urbanism, considers how built infrastructure and society interact. Through the urban realm, inhabitants learn to negotiate people, space, politics, and resources affecting their daily lives. The city is therefore a dynamic field of emergent possibility. This thesis uses the city as a lens through which the boundaries between informal and formal logics as well as the public and private might be blurred. Through analytical processes I have examined the environmental devices and assemblage of factors that consistently provide conditions through which learning may thrive. These parameters that make a creative space significant can help suggest the design of common ground environments through which innovation is catalyzed.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Large component-based systems are often built from many of the same components. As individual component-based software systems are developed, tested and maintained, these shared components are repeatedly manipulated. As a result there are often significant overlaps and synergies across and among the different test efforts of different component-based systems. However, in practice, testers of different systems rarely collaborate, taking a test-all-by-yourself approach. As a result, redundant effort is spent testing common components, and important information that could be used to improve testing quality is lost. The goal of this research is to demonstrate that, if done properly, testers of shared software components can save effort by avoiding redundant work, and can improve the test effectiveness for each component as well as for each component-based software system by using information obtained when testing across multiple components. To achieve this goal I have developed collaborative testing techniques and tools for developers and testers of component-based systems with shared components, applied the techniques to subject systems, and evaluated the cost and effectiveness of applying the techniques. The dissertation research is organized in three parts. First, I investigated current testing practices for component-based software systems to find the testing overlap and synergy we conjectured exists. Second, I designed and implemented infrastructure and related tools to facilitate communication and data sharing between testers. Third, I designed two testing processes to implement different collaborative testing algorithms and applied them to large actively developed software systems. This dissertation has shown the benefits of collaborative testing across component developers who share their components. With collaborative testing, researchers can design algorithms and tools to support collaboration processes, achieve better efficiency in testing configurations, and discover inter-component compatibility faults within a minimal time window after they are introduced.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increases in pediatric thyroid cancer incidence could be partly due to previous clinical intervention. This retrospective cohort study used 1973-2012 data from the Surveillance Epidemiology and End Results program to assess the association between previous radiation therapy exposure in development of second primary thyroid cancer (SPTC) among 0-19-year-old children. Statistical analysis included the calculation of summary statistics and univariable and multivariable logistic regression analysis. Relative to no previous radiation therapy exposure, cases exposed to radiation had 2.46 times the odds of developing SPTC (95% CI: 1.39-4.34). After adjustment for sex and age at diagnosis, Hispanic children who received radiation therapy for a first primary malignancy had 3.51 times the odds of developing SPTC compared to Hispanic children who had not received radiation therapy, [AOR=3.51, 99% CI: 0.69-17.70, p=0.04]. These findings support the development of age-specific guidelines for the use of radiation based interventions among children with and without cancer.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

There is a long history of debate around mathematics standards, reform efforts, and accountability. This research identified ways that national expectations and context drive local implementation of mathematics reform efforts and identified the external and internal factors that impact teachers’ acceptance or resistance to policy implementation at the local level. This research also adds to the body of knowledge about acceptance and resistance to policy implementation efforts. This case study involved the analysis of documents to provide a chronological perspective, assess the current state of the District’s mathematics reform, and determine the District’s readiness to implement the Common Core Curriculum. The school system in question has continued to struggle with meeting the needs of all students in Algebra 1. Therefore, the results of this case study will be useful to the District’s leaders as they include the compilation and analysis of a decade’s worth of data specific to Algebra 1.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Observational studies demonstrate strong associations between deficient serum vitamin D (25(OH)D) levels and cardiovascular disease. To further examine the association between vitamin D and hypertension (HTN), data from the 2003-2006 National Health and Nutrition Examination Survey were analyzed to assess whether the association between vitamin D and HTN varies by sufficiency of key co-nutrients necessary for metabolic vitamin D reactions to occur. Logistic regression results demonstrate independent effect modification by calcium, magnesium, and vitamin A on the association between vitamin D and HTN. Among non-pregnant adults with adequate renal function, those with low levels of calcium, magnesium, and vitamin D levels had 1.75 times the odds of HTN compared to those with sufficient vitamin D levels (p = <0.0001). Additionally, participants with low levels of calcium, magnesium, vitamin A, and vitamin D had 5.43 times the odds of HTN compared to those with vitamin D sufficiency (p = 0.0103).

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increasing the size of training data in many computer vision tasks has shown to be very effective. Using large scale image datasets (e.g. ImageNet) with simple learning techniques (e.g. linear classifiers) one can achieve state-of-the-art performance in object recognition compared to sophisticated learning techniques on smaller image sets. Semantic search on visual data has become very popular. There are billions of images on the internet and the number is increasing every day. Dealing with large scale image sets is intense per se. They take a significant amount of memory that makes it impossible to process the images with complex algorithms on single CPU machines. Finding an efficient image representation can be a key to attack this problem. A representation being efficient is not enough for image understanding. It should be comprehensive and rich in carrying semantic information. In this proposal we develop an approach to computing binary codes that provide a rich and efficient image representation. We demonstrate several tasks in which binary features can be very effective. We show how binary features can speed up large scale image classification. We present learning techniques to learn the binary features from supervised image set (With different types of semantic supervision; class labels, textual descriptions). We propose several problems that are very important in finding and using efficient image representation.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The predictive capabilities of computational fire models have improved in recent years such that models have become an integral part of many research efforts. Models improve the understanding of the fire risk of materials and may decrease the number of expensive experiments required to assess the fire hazard of a specific material or designed space. A critical component of a predictive fire model is the pyrolysis sub-model that provides a mathematical representation of the rate of gaseous fuel production from condensed phase fuels given a heat flux incident to the material surface. The modern, comprehensive pyrolysis sub-models that are common today require the definition of many model parameters to accurately represent the physical description of materials that are ubiquitous in the built environment. Coupled with the increase in the number of parameters required to accurately represent the pyrolysis of materials is the increasing prevalence in the built environment of engineered composite materials that have never been measured or modeled. The motivation behind this project is to develop a systematic, generalized methodology to determine the requisite parameters to generate pyrolysis models with predictive capabilities for layered composite materials that are common in industrial and commercial applications. This methodology has been applied to four common composites in this work that exhibit a range of material structures and component materials. The methodology utilizes a multi-scale experimental approach in which each test is designed to isolate and determine a specific subset of the parameters required to define a material in the model. Data collected in simultaneous thermogravimetry and differential scanning calorimetry experiments were analyzed to determine the reaction kinetics, thermodynamic properties, and energetics of decomposition for each component of the composite. Data collected in microscale combustion calorimetry experiments were analyzed to determine the heats of complete combustion of the volatiles produced in each reaction. Inverse analyses were conducted on sample temperature data collected in bench-scale tests to determine the thermal transport parameters of each component through degradation. Simulations of quasi-one-dimensional bench-scale gasification tests generated from the resultant models using the ThermaKin modeling environment were compared to experimental data to independently validate the models.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Edge-labeled graphs have proliferated rapidly over the last decade due to the increased popularity of social networks and the Semantic Web. In social networks, relationships between people are represented by edges and each edge is labeled with a semantic annotation. Hence, a huge single graph can express many different relationships between entities. The Semantic Web represents each single fragment of knowledge as a triple (subject, predicate, object), which is conceptually identical to an edge from subject to object labeled with predicates. A set of triples constitutes an edge-labeled graph on which knowledge inference is performed. Subgraph matching has been extensively used as a query language for patterns in the context of edge-labeled graphs. For example, in social networks, users can specify a subgraph matching query to find all people that have certain neighborhood relationships. Heavily used fragments of the SPARQL query language for the Semantic Web and graph queries of other graph DBMS can also be viewed as subgraph matching over large graphs. Though subgraph matching has been extensively studied as a query paradigm in the Semantic Web and in social networks, a user can get a large number of answers in response to a query. These answers can be shown to the user in accordance with an importance ranking. In this thesis proposal, we present four different scoring models along with scalable algorithms to find the top-k answers via a suite of intelligent pruning techniques. The suggested models consist of a practically important subset of the SPARQL query language augmented with some additional useful features. The first model called Substitution Importance Query (SIQ) identifies the top-k answers whose scores are calculated from matched vertices' properties in each answer in accordance with a user-specified notion of importance. The second model called Vertex Importance Query (VIQ) identifies important vertices in accordance with a user-defined scoring method that builds on top of various subgraphs articulated by the user. Approximate Importance Query (AIQ), our third model, allows partial and inexact matchings and returns top-k of them with a user-specified approximation terms and scoring functions. In the fourth model called Probabilistic Importance Query (PIQ), a query consists of several sub-blocks: one mandatory block that must be mapped and other blocks that can be opportunistically mapped. The probability is calculated from various aspects of answers such as the number of mapped blocks, vertices' properties in each block and so on and the most top-k probable answers are returned. An important distinguishing feature of our work is that we allow the user a huge amount of freedom in specifying: (i) what pattern and approximation he considers important, (ii) how to score answers - irrespective of whether they are vertices or substitution, and (iii) how to combine and aggregate scores generated by multiple patterns and/or multiple substitutions. Because so much power is given to the user, indexing is more challenging than in situations where additional restrictions are imposed on the queries the user can ask. The proposed algorithms for the first model can also be used for answering SPARQL queries with ORDER BY and LIMIT, and the method for the second model also works for SPARQL queries with GROUP BY, ORDER BY and LIMIT. We test our algorithms on multiple real-world graph databases, showing that our algorithms are far more efficient than popular triple stores.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The graph Laplacian operator is widely studied in spectral graph theory largely due to its importance in modern data analysis. Recently, the Fourier transform and other time-frequency operators have been defined on graphs using Laplacian eigenvalues and eigenvectors. We extend these results and prove that the translation operator to the i’th node is invertible if and only if all eigenvectors are nonzero on the i’th node. Because of this dependency on the support of eigenvectors we study the characteristic set of Laplacian eigenvectors. We prove that the Fiedler vector of a planar graph cannot vanish on large neighborhoods and then explicitly construct a family of non-planar graphs that do exhibit this property. We then prove original results in modern analysis on graphs. We extend results on spectral graph wavelets to create vertex-dyanamic spectral graph wavelets whose support depends on both scale and translation parameters. We prove that Spielman’s Twice-Ramanujan graph sparsifying algorithm cannot outperform his conjectured optimal sparsification constant. Finally, we present numerical results on graph conditioning, in which edges of a graph are rescaled to best approximate the complete graph and reduce average commute time.