922 resultados para Functional data analysis
Resumo:
Brazil has a great diversity of native fruits, which are not always widely consumed, being sold only in certain regions, due to their difficulty of post-harvest conservation. One such fruit is yellow guava, interesting source of nutrients. To promote the consumption and use of this fruit to the consumer public in different regions of the country, this study evaluated the incorporation of yellow Ya-cy araçá in formulating a cereal bar. Therefore, fruits were evaluated for their chemical, physical and chemical characteristics and bioactive compounds in different stages of maturation yellow guava (green, mature and dried forms). The behavior of guava yellow front of to UV-C radiation was also evaluated. After these reviews, there was obtained yellow ripe guava flour after previous tests, was added to the base formulation cereal bar. For the experimental planning and development of the formulations was used factorial design 22 with a central point. The developed formulations were subjected to sensory evaluation using for treatment of multivariate data analysis (Principal Component Analysis- ACP). The preferred formulation in sensory evaluation was evaluated in their physical characteristics (texture), physical-chemical (moisture, ash, lipids, proteins, carbohydrates, dietary fiber and calorie), mineral content and fatty acid profile. The results indicated that the added yellow guava cereal bar developed in this study is one way to application and use of guava, increasing the consumption of fruit to different regions of the country, and can be considered a functional product, not only to contain the fruit in its composition, but also to present many beneficial nutrients that contribute to the health of consumers.
Resumo:
Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization–maximization (MM) algorithm with a Nelder–Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.
Resumo:
The protein lysate array is an emerging technology for quantifying the protein concentration ratios in multiple biological samples. It is gaining popularity, and has the potential to answer questions about post-translational modifications and protein pathway relationships. Statistical inference for a parametric quantification procedure has been inadequately addressed in the literature, mainly due to two challenges: the increasing dimension of the parameter space and the need to account for dependence in the data. Each chapter of this thesis addresses one of these issues. In Chapter 1, an introduction to the protein lysate array quantification is presented, followed by the motivations and goals for this thesis work. In Chapter 2, we develop a multi-step procedure for the Sigmoidal models, ensuring consistent estimation of the concentration level with full asymptotic efficiency. The results obtained in this chapter justify inferential procedures based on large-sample approximations. Simulation studies and real data analysis are used to illustrate the performance of the proposed method in finite-samples. The multi-step procedure is simpler in both theory and computation than the single-step least squares method that has been used in current practice. In Chapter 3, we introduce a new model to account for the dependence structure of the errors by a nonlinear mixed effects model. We consider a method to approximate the maximum likelihood estimator of all the parameters. Using the simulation studies on various error structures, we show that for data with non-i.i.d. errors the proposed method leads to more accurate estimates and better confidence intervals than the existing single-step least squares method.
Resumo:
Datacenters have emerged as the dominant form of computing infrastructure over the last two decades. The tremendous increase in the requirements of data analysis has led to a proportional increase in power consumption and datacenters are now one of the fastest growing electricity consumers in the United States. Another rising concern is the loss of throughput due to network congestion. Scheduling models that do not explicitly account for data placement may lead to a transfer of large amounts of data over the network causing unacceptable delays. In this dissertation, we study different scheduling models that are inspired by the dual objectives of minimizing energy costs and network congestion in a datacenter. As datacenters are equipped to handle peak workloads, the average server utilization in most datacenters is very low. As a result, one can achieve huge energy savings by selectively shutting down machines when demand is low. In this dissertation, we introduce the network-aware machine activation problem to find a schedule that simultaneously minimizes the number of machines necessary and the congestion incurred in the network. Our model significantly generalizes well-studied combinatorial optimization problems such as hard-capacitated hypergraph covering and is thus strongly NP-hard. As a result, we focus on finding good approximation algorithms. Data-parallel computation frameworks such as MapReduce have popularized the design of applications that require a large amount of communication between different machines. Efficient scheduling of these communication demands is essential to guarantee efficient execution of the different applications. In the second part of the thesis, we study the approximability of the co-flow scheduling problem that has been recently introduced to capture these application-level demands. Finally, we also study the question, "In what order should one process jobs?'' Often, precedence constraints specify a partial order over the set of jobs and the objective is to find suitable schedules that satisfy the partial order. However, in the presence of hard deadline constraints, it may be impossible to find a schedule that satisfies all precedence constraints. In this thesis we formalize different variants of job scheduling with soft precedence constraints and conduct the first systematic study of these problems.
Resumo:
Résumé : Le vieillissement démographique est statistiquement indiscutable au Québec. Ce singulier trompeur masque les différentes manières de vieillir. Pour ceux qui ne parviennent pas à vieillir en santé, les solidarités familiales, comme les solidarités institutionnelles, c’est à dire publiques viennent en principe compenser ce qu’il est convenu de désigner de perte d’autonomie. Les politiques de santé publique au Québec organisent les services de soutien à domicile sous condition d’avoir estimé la situation de la personne avec l’outil d’évaluation multiclientèle (OEMC). Il est en usage dans l’ensemble du réseau de la santé et des services sociaux, et utilisé par les professionnels dont les travailleuses et les travailleurs sociaux (TS). Or, la gérontologie est peu soutenue dans la formation initiale des TS. Nous nous sommes interrogée sur les savoirs mobilisés par les TS quand ils évaluent. S’agissant des savoirs inscrits dans la pratique, nous avons orienté la recherche dans les théories de l’activité, la didactique professionnelle et le cadre conceptuel de la médiation. Nous avons étudié l’activité de professionnels en travail social expérimentés afin d’identifier certains des savoirs mobilisés pour les rendre disponibles à la formation des étudiant (e)s en travail social au Québec. Cent-cinquante heures d’observations et vingt-deux entretiens individuels et collectifs ont été réalisés avec des intervenants volontaires du service de soutien à domicile. Les résultats préliminaires de la recherche ont été présentés lors de groupes de discussion avec les TS ayant participé à la recherche, puis avec des enseignants en travail social. Nos résultats permettent de décrire les procédures de l’évaluation dans l’organisation du service d’aide à domicile et d’en différencier le processus de l’activité par laquelle le TS évalue l’autonomie fonctionnelle de la personne. Nous constatons que les savoirs mobilisés par les TS reposent premièrement sur une connaissance fine du territoire, de l’outil d’évaluation et des institutions. Un deuxième registre de savoir concerne la conceptualisation de l’autonomie fonctionnelle par l’outil OEMC comme objet et domaine d’intervention des TS. Enfin, un troisième registre se réfère aux savoirs mobilisés pour entrer en relation avec les personnes âgées, avec leur entourage. Or, ces trois registres de savoir n’apparaissent pas dans le discours des TS et résultent de notre propre analyse sur leur pratique. L’évaluation de l’autonomie fonctionnelle analysée par le concept de médiation est révélatrice du rapport aux savoirs du TS. S’agissant de savoirs de la pratique, nous constatons que leur classification entre les catégories usuelles de savoirs théoriques ou pratiques était inopérante. Nous empruntons le vocabulaire de la didactique professionnelle : celui des invariants opératoires reliés à l’autonomie fonctionnelle et celui des schèmes d’activité reliés à l’activité d’évaluation. C’est ainsi que nous avons identifié deux moments dans l’évaluation. Le premier assemble la collecte des informations et l’analyse des données. L’autonomie fonctionnelle se décline dans des conditions d’existence de la personne sur l’axe allant de la mobilité à la cognition avec comme balises d’intervention la sécurité et l’intégrité de la personne. Dans ce processus itératif, le TS identifie avec la personne ce qui nuit à son quotidien. L’évaluation formule comment résoudre cette incidence, comment la perte d’autonomie pourrait être compensée. La collecte d’information et le raisonnement du TS est alors un mouvement itératif, les deux éléments du processus sont liés et en continu. Le second moment de l’évaluation apparait si, dans le processus itératif, le TS perçoit une dissonance. Il est essentiel d’en identifier la nature pour la prendre en compte et maintenir la finalité de l’activité qui consiste à évaluer l’autonomie fonctionnelle à des fins compensatrices. Le TS doit identifier l’objet de la dissonance pour pouvoir cerner avec la personne le besoin inhérent à la perte d’autonomie et envisager d’y remédier. La prise en compte de cette dissonance vient ralentir le déroulement de l’activité. Le raisonnement qui, jusque-là, était relié à la collecte d’informations s’en dissocie pour analyser ce qui vient faire obstacle à l’activité d’évaluation à partir de la situation. Les composantes qui génèrent la dissonance paraissent reliées à la quotidienneté, aux conditions de vie à domicile de la personne (cohérence/incohérence, refus de services, autonégligence, maltraitance, agressivité). La dissonance génère une activité plus complexe pour évaluer la situation. L’autonomie fonctionnelle se décline toujours sur l’axe mobilité/cognition avec comme balises d’intervention la sécurité et l’intégrité de la personne. Or, pour ce faire, les TS raisonnent selon trois schèmes. Dans les situations où, pour décider de la suite du dossier, il faut en référer à une norme (de service, de profession, etc.) le raisonnement est déontologique. Il est aussi des situations où le TS agit au regard de valeurs et de représentations qui relèvent de sa sphère personnelle. Nous désignons ce raisonnement d’instinctuel. Enfin, le TS peut naviguer entre ces deux orientations et choisir la voie du raisonnement clinique que nous qualifions d’éthique et se rapproche alors des pratiques prudentielles qui sont marquées par l’incertitude.
Resumo:
The square root velocity framework is a method in shape analysis to define a distance between curves and functional data. Identifying two curves, if the differ by a reparametrization leads to the quotient space of unparametrized curves. In this paper we study analytical and topological aspects of this construction for the class of absolutely continuous curves. We show that the square root velocity transform is a homeomorphism and that the action of the reparametrization semigroup is continuous. We also show that given two $C^1$-curves, there exist optimal reparametrizations realising the minimal distance between the unparametrized curves represented by them.
Resumo:
A investigação na área da saúde e a utilização dos seus resultados tem funcionado como base para a melhoria da qualidade de cuidados, exigindo dos profissionais de saúde conhecimentos na área específica onde desempenham funções, conhecimentos em metodologia de investigação que incluam as técnicas de observação, técnicas de recolha e análise de dados, para mais facilmente serem leitores capacitados dos resultados da investigação. Os profissionais de saúde são observadores privilegiados das respostas humanas à saúde e à doença, podendo contribuir para o desenvolvimento e bem-estar dos indivíduos muitas vezes em situações de grande vulnerabilidade. Em saúde infantil e pediatria o enfoque está nos cuidados centrados na família privilegiando-se o desenvolvimento harmonioso da criança e jovem, valorizando os resultados mensuráveis em saúde que permitam determinar a eficácia das intervenções e a qualidade de saúde e de vida. No contexto pediátrico realçamos as práticas baseadas na evidência, a importância atribuída à pesquisa e à aplicação dos resultados da investigação nas práticas clínicas, assim como o desenvolvimento de instrumentos de mensuração padronizados, nomeadamente as escalas de avaliação, de ampla utilização clínica, que facilitam a apreciação e avaliação do desenvolvimento e da saúde das crianças e jovens e resultem em ganhos em saúde. A observação de forma sistematizada das populações neonatais e pediátricas com escalas de avaliação tem vindo a aumentar, o que tem permitido um maior equilíbrio na avaliação das crianças e também uma observação baseada na teoria e nos resultados da investigação. Alguns destes aspetos serviram de base ao desenvolvimento deste trabalho que pretende dar resposta a 3 objetivos fundamentais. Para dar resposta ao primeiro objetivo, “Identificar na literatura científica, os testes estatísticos mais frequentemente utilizados pelos investigadores da área da saúde infantil e pediatria quando usam escalas de avaliação” foi feita uma revisão sistemática da literatura, que tinha como objetivo analisar artigos científicos cujos instrumentos de recolha de dados fossem escalas de avaliação, na área da saúde da criança e jovem, desenvolvidas com variáveis ordinais, e identificar os testes estatísticos aplicados com estas variáveis. A análise exploratória dos artigos permitiu-nos verificar que os investigadores utilizam diferentes instrumentos com diferentes formatos de medida ordinal (com 3, 4, 5, 7, 10 pontos) e tanto aplicam testes paramétricos como não paramétricos, ou os dois em simultâneo, com este tipo de variáveis, seja qual for a dimensão da amostra. A descrição da metodologia nem sempre explicita se são cumpridas as assunções dos testes. Os artigos consultados nem sempre fazem referência à distribuição de frequência das variáveis (simetria/assimetria) nem à magnitude das correlações entre os itens. A leitura desta bibliografia serviu de suporte à elaboração de dois artigos, um de revisão sistemática da literatura e outro de reflexão teórica. Apesar de terem sido encontradas algumas respostas às dúvidas com que os investigadores e os profissionais, que trabalham com estes instrumentos, se deparam, verifica-se a necessidade de desenvolver estudos de simulação que confirmem algumas situações reais e alguma teoria já existente, e trabalhem outros aspetos nos quais se possam enquadrar os cenários reais de forma a facilitar a tomada de decisão dos investigadores e clínicos que utilizam escalas de avaliação. Para dar resposta ao segundo objetivo “Comparar a performance, em termos de potência e probabilidade de erro de tipo I, das 4 estatísticas da MANOVA paramétrica com 2 estatísticas da MANOVA não paramétrica quando se utilizam variáveis ordinais correlacionadas, geradas aleatoriamente”, desenvolvemos um estudo de simulação, através do Método de Monte Carlo, efetuado no Software R. O delineamento do estudo de simulação incluiu um vetor com 3 variáveis dependentes, uma variável independente (fator com três grupos), escalas de avaliação com um formato de medida com 3, 4, 5, e 7 pontos, diferentes probabilidades marginais (p1 para distribuição simétrica, p2 para distribuição assimétrica positiva, p3 para distribuição assimétrica negativa e p4 para distribuição uniforme) em cada um dos três grupos, correlações de baixa, média e elevada magnitude (r=0.10, r=0.40, r=0.70, respetivamente), e seis dimensões de amostras (n=30, 60, 90, 120, 240, 300). A análise dos resultados permitiu dizer que a maior raiz de Roy foi a estatística que apresentou estimativas de probabilidade de erro de tipo I e de potência de teste mais elevadas. A potência dos testes apresenta comportamentos diferentes, dependendo da distribuição de frequência da resposta aos itens, da magnitude das correlações entre itens, da dimensão da amostra e do formato de medida da escala. Tendo por base a distribuição de frequência, considerámos três situações distintas: a primeira (com probabilidades marginais p1,p1,p4 e p4,p4,p1) em que as estimativas da potência eram muito baixas, nos diferentes cenários; a segunda situação (com probabilidades marginais p2,p3,p4; p1,p2,p3 e p2,p2,p3) em que a magnitude das potências é elevada, nas amostras com dimensão superior ou igual a 60 observações e nas escalas com 3, 4,5 pontos e potências de magnitude menos elevada nas escalas com 7 pontos, mas com a mesma ma magnitude nas amostras com dimensão igual a 120 observações, seja qual for o cenário; a terceira situação (com probabilidades marginais p1,p1,p2; p1,p2,p4; p2,p2,p1; p4,p4,p2 e p2,p2,p4) em que quanto maiores, a intensidade das correlações entre itens e o número de pontos da escala, e menor a dimensão das amostras, menor a potência dos testes, sendo o lambda de Wilks aplicado às ordens mais potente do que todas as outra s estatísticas da MANOVA, com valores imediatamente a seguir à maior raiz de Roy. No entanto, a magnitude das potências dos testes paramétricos e não paramétricos assemelha-se nas amostras com dimensão superior a 90 observações (com correlações de baixa e média magnitude), entre as variáveis dependentes nas escalas com 3, 4 e 5 pontos; e superiores a 240 observações, para correlações de baixa intensidade, nas escalas com 7 pontos. No estudo de simulação e tendo por base a distribuição de frequência, concluímos que na primeira situação de simulação e para os diferentes cenários, as potências são de baixa magnitude devido ao facto de a MANOVA não detetar diferenças entre grupos pela sua similaridade. Na segunda situação de simulação e para os diferentes cenários, a magnitude das potências é elevada em todos os cenários cuja dimensão da amostra seja superior a 60 observações, pelo que é possível aplicar testes paramétricos. Na terceira situação de simulação, e para os diferentes cenários quanto menor a dimensão da amostra e mais elevada a intensidade das correlações e o número de pontos da escala, menor a potência dos testes, sendo a magnitude das potências mais elevadas no teste de Wilks aplicado às ordens, seguido do traço de Pillai aplicado às ordens. No entanto, a magnitude das potências dos testes paramétricos e não paramétricos assemelha-se nas amostras com maior dimensão e correlações de baixa e média magnitude. Para dar resposta ao terceiro objetivo “Enquadrar os resultados da aplicação da MANOVA paramétrica e da MANOVA não paramétrica a dados reais provenientes de escalas de avaliação com um formato de medida com 3, 4, 5 e 7 pontos, nos resultados do estudo de simulação estatística” utilizaram-se dados reais que emergiram da observação de recém-nascidos com a escala de avaliação das competências para a alimentação oral, Early Feeding Skills (EFS), o risco de lesões da pele, com a Neonatal Skin Risk Assessment Scale (NSRAS), e a avaliação da independência funcional em crianças e jovens com espinha bífida, com a Functional Independence Measure (FIM). Para fazer a análise destas escalas foram realizadas 4 aplicações práticas que se enquadrassem nos cenários do estudo de simulação. A idade, o peso, e o nível de lesão medular foram as variáveis independentes escolhidas para selecionar os grupos, sendo os recém-nascidos agrupados por “classes de idade gestacional” e por “classes de peso” as crianças e jovens com espinha bífida por “classes etárias” e “níveis de lesão medular”. Verificou-se um bom enquadramento dos resultados com dados reais no estudo de simulação.
Resumo:
Objectives: To determine the frequency of vaccination in older adults within the city of Bogotá and to estimate the association with sociodemographic and health factors. Methods: This is a secondary data analysis from the SABE-Bogotá Study, a cross-sectional population-based study that included a total of 2,000 persons aged 60 years. Weighted percentages for self-reported vaccination [influenza, pneumococcal, tetanus] were determined. The association between vaccination and covariates was evaluate by logistic regression models. Results: A total of 73.0% of respondents received influenza, 57.8% pneumococcal and 47.6% tetanus vaccine. Factors independently associated with vaccination included: 1- age (65-74 years had higher odds of receiving vaccinations, compared to 60-64 years; 2- socioeconomic status (SES) (higher SES had lower odds of having influenza and pneumococcal vaccines, compared to those with lower SES); 3- health insurance (those with contributive or subsidized health insurance had higher odds (between 3 and 5 times higher) of having vaccinations, compared to those with no insurance); 4- older adults with better functional status (greater Lawton scores) had increased odds for all vaccinations; 5- older adults with higher comorbidity had increased odds for influenza and pneumococcal vaccinations. Conclusion: Vaccination campaigns should be strengthened to increase vaccination coverage, especially in the group more reticent to vaccination or vulnerable to reach it such as the disable elder.
Resumo:
Social capital, or social cohesion or group connectedness, can influence both HIV risk behavior and substance use. Because recent immigrants undergo a change in environment, one of the consequences can be a change in social capital. There may be an association among changes in social capital, and HIV risk behavior and substance use post immigration. The dissertation focused on the interface of these three variables among recent Latino immigrants (RLIs) in South Florida. The first manuscript is a systematic review of social capital and HIV risk behavior, and served as a partial background for the second and third manuscripts. Twelve papers with a measure of social capital as an independent variable and HIV risk as the dependent variable were included in the analysis. Eleven studies measured social capital at the individual level, and one study measured social capital at the group level. HIV risk was influenced by social capital, but the type of influence was dependent on the type of social capital and on the study population. Cognitive social capital, or levels of collective action, was protective against HIV in both men and women. The role of structural social capital, or levels of civic engagement/group participation, on HIV risk was dependent on the type of structural social capital and varied by gender. Microfinance programs and functional group participation were protective for women, while dysfunctional group participation and peer-level support may have increased HIV risk among men. The second manuscript was an original study assessing changes in social capital and HIV risk behavior pre to post immigration among RLIs in South Florida (n=527). HIV risk behavior was assessed through the frequency of vaginal-penile condom use, and the number of sexual partners. It was a longitudinal study using secondary data analysis to assess changes in social capital and HIV risk behavior pre immigration to two years post immigration, and to determine if there was a relationship between the two variables. There was an 8% decrease in total social capital (p ˂ .05). Reporting of ‘Never use’ of condoms in the past 90 days increased in all subcategories (p ˂ .05). Single men had a decrease in number of sexual partners (p ˂ .05). Lower social capital measured on the dimension of ‘friend and other’ was marginally associated with fewer sexual partners. The third manuscript was another original study looking at the association between social capital and substance use among RLIs in South Florida (n=527). Substance use with measured by frequency of hazardous alcoholic drinking, and illicit drug use. It was a longitudinal study of social capital and substance-use from pre to two years post immigration. Post-immigration, social capital, hazardous drinking and illicit drug use decreased (p˂.001). After adjusting for time, compared to males, females were less likely to engage in hazardous drinking (OR=.31, p˂.001), and less likely to engage in illicit drug use (OR=.67, p=.01). Documentation status was a moderator between social capital and illicit drug use. ‘Business’ and ‘Agency’ social capital were associated with changes in illicit drug use for documented immigrants. After adjusting for gender and marital status, on average, documented immigrants with a one-unit increase in ‘business’ social capital were 1.2 times more likely to engage in illicit drug use (p˂.01), and documented immigrants with one-unit increase in ‘agency’ social capital were 38% less likely to engage in illicit drug use (p˂.01). ‘Friend and other’ social capital was associated with a decrease in illicit drug use among undocumented immigrants. After adjusting for gender and marital status, on average, undocumented immigrants with a one-unit increase in ‘friend and other’ social capital were 45% less likely to engage in hazardous drinking and 44% less likely to use illicit drugs (p˂.01, p˂.05). Studying these three domains is relevant because HIV continues to be a public health issue, particularly in Miami-Dade County, which is ranked among other U.S. regions with high rates of HIV/AIDS prevalence. Substance use is associated with HIV risk behavior; in most studies, increased substance use is associated with increased chances of HIV risk behavior. Immigration, which is the hypothesized catalyst for the change in social capital, has an impact on the dynamic of a society. Greater immigration can be burdensome on the host country’s societal resources; however immigrants are also potentially a source of additional skilled labor for the workforce. Therefore, successful adaption of immigrants can have a positive influence on receiving communities. With Florida being a major receiver of immigrants to the U.S, this dissertation attempts to address an important public health issue for South Florida and the U.S. at large.
Resumo:
An overview is given of a user interaction monitoring and analysis framework called BaranC. Monitoring and analysing human-digital interaction is an essential part of developing a user model as the basis for investigating user experience. The primary human-digital interaction, such as on a laptop or smartphone, is best understood and modelled in the wider context of the user and their environment. The BaranC framework provides monitoring and analysis capabilities that not only records all user interaction with a digital device (e.g. smartphone), but also collects all available context data (such as from sensors in the digital device itself, a fitness band or a smart appliances). The data collected by BaranC is recorded as a User Digital Imprint (UDI) which is, in effect, the user model and provides the basis for data analysis. BaranC provides functionality that is useful for user experience studies, user interface design evaluation, and providing user assistance services. An important concern for personal data is privacy, and the framework gives the user full control over the monitoring, storing and sharing of their data.
Resumo:
Big data are reshaping the way we interact with technology, thus fostering new applications to increase the safety-assessment of foods. An extraordinary amount of information is analysed using machine learning approaches aimed at detecting the existence or predicting the likelihood of future risks. Food business operators have to share the results of these analyses when applying to place on the market regulated products, whereas agri-food safety agencies (including the European Food Safety Authority) are exploring new avenues to increase the accuracy of their evaluations by processing Big data. Such an informational endowment brings with it opportunities and risks correlated to the extraction of meaningful inferences from data. However, conflicting interests and tensions among the involved entities - the industry, food safety agencies, and consumers - hinder the finding of shared methods to steer the processing of Big data in a sound, transparent and trustworthy way. A recent reform in the EU sectoral legislation, the lack of trust and the presence of a considerable number of stakeholders highlight the need of ethical contributions aimed at steering the development and the deployment of Big data applications. Moreover, Artificial Intelligence guidelines and charters published by European Union institutions and Member States have to be discussed in light of applied contexts, including the one at stake. This thesis aims to contribute to these goals by discussing what principles should be put forward when processing Big data in the context of agri-food safety-risk assessment. The research focuses on two interviewed topics - data ownership and data governance - by evaluating how the regulatory framework addresses the challenges raised by Big data analysis in these domains. The outcome of the project is a tentative Roadmap aimed to identify the principles to be observed when processing Big data in this domain and their possible implementations.
Resumo:
The world of Computational Biology and Bioinformatics presently integrates many different expertise, including computer science and electronic engineering. A major aim in Data Science is the development and tuning of specific computational approaches to interpret the complexity of Biology. Molecular biologists and medical doctors heavily rely on an interdisciplinary expert capable of understanding the biological background to apply algorithms for finding optimal solutions to their problems. With this problem-solving orientation, I was involved in two basic research fields: Cancer Genomics and Enzyme Proteomics. For this reason, what I developed and implemented can be considered a general effort to help data analysis both in Cancer Genomics and in Enzyme Proteomics, focusing on enzymes which catalyse all the biochemical reactions in cells. Specifically, as to Cancer Genomics I contributed to the characterization of intratumoral immune microenvironment in gastrointestinal stromal tumours (GISTs) correlating immune cell population levels with tumour subtypes. I was involved in the setup of strategies for the evaluation and standardization of different approaches for fusion transcript detection in sarcomas that can be applied in routine diagnostic. This was part of a coordinated effort of the Sarcoma working group of "Alleanza Contro il Cancro". As to Enzyme Proteomics, I generated a derived database collecting all the human proteins and enzymes which are known to be associated to genetic disease. I curated the data search in freely available databases such as PDB, UniProt, Humsavar, Clinvar and I was responsible of searching, updating, and handling the information content, and computing statistics. I also developed a web server, BENZ, which allows researchers to annotate an enzyme sequence with the corresponding Enzyme Commission number, the important feature fully describing the catalysed reaction. More to this, I greatly contributed to the characterization of the enzyme-genetic disease association, for a better classification of the metabolic genetic diseases.
Resumo:
Model misspecification affects the classical test statistics used to assess the fit of the Item Response Theory (IRT) models. Robust tests have been derived under model misspecification, as the Generalized Lagrange Multiplier and Hausman tests, but their use has not been largely explored in the IRT framework. In the first part of the thesis, we introduce the Generalized Lagrange Multiplier test to detect differential item response functioning in IRT models for binary data under model misspecification. By means of a simulation study and a real data analysis, we compare its performance with the classical Lagrange Multiplier test, computed using the Hessian and the cross-product matrix, and the Generalized Jackknife Score test. The power of these tests is computed empirically and asymptotically. The misspecifications considered are local dependence among items and non-normal distribution of the latent variable. The results highlight that, under mild model misspecification, all tests have good performance while, under strong model misspecification, the performance of the tests deteriorates. None of the tests considered show an overall superior performance than the others. In the second part of the thesis, we extend the Generalized Hausman test to detect non-normality of the latent variable distribution. To build the test, we consider a seminonparametric-IRT model, that assumes a more flexible latent variable distribution. By means of a simulation study and two real applications, we compare the performance of the Generalized Hausman test with the M2 limited information goodness-of-fit test and the Likelihood-Ratio test. Additionally, the information criteria are computed. The Generalized Hausman test has a better performance than the Likelihood-Ratio test in terms of Type I error rates and the M2 test in terms of power. The performance of the Generalized Hausman test and the information criteria deteriorates when the sample size is small and with a few items.
Resumo:
Per comprende più a fondo il problema che le aziende affrontare per formare le persone in grado di gestire processi di innovazione, in particolare di Open Innovation (OI), è stato realizzato nel 2021 uno studio di caso multiplo di un percorso di educazione non formale all’OI realizzato dalla società consortile ART-ER e rivolto ai dottorandi degli atenei emiliano-romagnoli. Nella seconda fase di tale percorso formativo, per rispondere alle sfide di OI lanciate dalle aziende, sono stati costituiti 4 tavoli di lavoro. A ciascun tavolo di lavoro hanno preso parte 3/4 dottorandi, due referenti aziendali, un consulente e un operatore di ART-ER. Il campione complessivo era costituito da 14 dottorandi; 8 referenti aziendali di quattro aziende; 4 membri di una società di consulenza e 4 operatori della società consortile ART-ER. Il seguente interrogativo di ricerca ha guidato l’indagine: l’interazione tra i soggetti coinvolti in ciascun tavolo di lavoro – considerato un singolo caso - si configura come una Comunità di Pratica in grado di favorire lo sviluppo di apprendimenti individuali funzionali a gestire i processi di OI attivati nelle imprese? I dati sono stati raccolti attraverso una ricerca documentale a tavolino, focus group, interviste semistrutturate e un questionario semistrutturato online. L’analisi dei dati è stata effettuata mediante un’analisi qualitativa del contenuto in più fasi con l’ausilio del software MAXQDA. I risultati dimostrano che in tre casi su quattro, i tavoli di lavoro si sono configurati come una Comunità di Pratica. In questi tre tavoli inoltre è emerso lo sviluppo di alcune aree di competenza funzionali alla gestione dei processi di OI. Nella conclusione sono state presentate alcune proposte per la riprogettazione delle future edizioni del percorso formativo.