960 resultados para data sets


Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tiivistelmä ReferatAbstract Metabolomics is a rapidly growing research field that studies the response of biological systems to environmental factors, disease states and genetic modifications. It aims at measuring the complete set of endogenous metabolites, i.e. the metabolome, in a biological sample such as plasma or cells. Because metabolites are the intermediates and end products of biochemical reactions, metabolite compositions and metabolite levels in biological samples can provide a wealth of information on on-going processes in a living system. Due to the complexity of the metabolome, metabolomic analysis poses a challenge to analytical chemistry. Adequate sample preparation is critical to accurate and reproducible analysis, and the analytical techniques must have high resolution and sensitivity to allow detection of as many metabolites as possible. Furthermore, as the information contained in the metabolome is immense, the data set collected from metabolomic studies is very large. In order to extract the relevant information from such large data sets, efficient data processing and multivariate data analysis methods are needed. In the research presented in this thesis, metabolomics was used to study mechanisms of polymeric gene delivery to retinal pigment epithelial (RPE) cells. The aim of the study was to detect differences in metabolomic fingerprints between transfected cells and non-transfected controls, and thereafter to identify metabolites responsible for the discrimination. The plasmid pCMV-β was introduced into RPE cells using the vector polyethyleneimine (PEI). The samples were analyzed using high performance liquid chromatography (HPLC) and ultra performance liquid chromatography (UPLC) coupled to a triple quadrupole (QqQ) mass spectrometer (MS). The software MZmine was used for raw data processing and principal component analysis (PCA) was used in statistical data analysis. The results revealed differences in metabolomic fingerprints between transfected cells and non-transfected controls. However, reliable fingerprinting data could not be obtained because of low analysis repeatability. Therefore, no attempts were made to identify metabolites responsible for discrimination between sample groups. Repeatability and accuracy of analyses can be influenced by protocol optimization. However, in this study, optimization of analytical methods was hindered by the very small number of samples available for analysis. In conclusion, this study demonstrates that obtaining reliable fingerprinting data is technically demanding, and the protocols need to be thoroughly optimized in order to approach the goals of gaining information on mechanisms of gene delivery.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Habitat fragmentation produces patches of suitable habitat surrounded by unfavourable matrix habitat. A species may persist in such a fragmented landscape in an equilibrium between the extinctions and recolonizations of local populations, thus forming a metapopulation. Migration between local populations is necessary for the long-term persistence of a metapopulation. The Glanville fritillary butterfly (Melitaea cinxia) forms a metapopulation in the Åland islands in Finland. There is migration between the populations, the extent of which is affected by several environmental factors and variation in the phenotype of individual butterflies. Different allelic forms of the glycolytic enzyme phosphoglucose isomerase (Pgi) has been identified as a possible genetic factor influencing flight performance and migration rate in this species. The frequency of a certain Pgi allele, Pgi-f, follows the same pattern in relation to population age and connectivity as migration propensity. Furthermore, variation in flight metabolic performance, which is likely to affect migration propensity, has been linked to genetic variation in Pgi or a closely linked locus. The aim of this study was to investigate the association between Pgi genotype and the migration propensity in the Glanville fritillary both at the individual and population levels using a statistical modelling approach. A mark-release-recapture (MRR) study was conducted in a habitat patch network of M. cinxia in Åland to collect data on the movements of individual butterflies. Larval samples from the study area were also collected for population level examinations. Each butterfly and larva was genotyped at the Pgi locus. The MRR data was parameterised with two mathematical models of migration: the Virtual Migration Model (VM) and the spatially explicit diffusion model. VM model predicted and observed numbers of emigrants from populations with high and low frequencies of Pgi-f were compared. Posterior predictive data sets were simulated based on the parameters of the diffusion model. Lack-of-fit of observed values to the model predicted values of several descriptors of movements were detected, and the effect of Pgi genotype on the deviations was assessed by randomizations including the genotype information. This study revealed a possible difference in the effect of Pgi genotype on migration propensity between the two sexes in the Glanville fritillary. The females with and males without the Pgi-f allele moved more between habitat patches, which is probably related to differences in the function of flight in the two sexes. Females may use their high flight capacity to migrate between habitat patches to find suitable oviposition sites, whereas males may use it to acquire mates by keeping a territory and fighting off other intruding males, possibly causing them to emigrate. The results were consistent across different movement descriptors and at the individual and population levels. The effect of Pgi is likely to be dependent on the structure of the landscape and the prevailing environmental conditions.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The factors affecting the non-industrial, private forest landowners' (hereafter referred to using the acronym NIPF) strategic decisions in management planning are studied. A genetic algorithm is used to induce a set of rules predicting potential cut of the landowners' choices of preferred timber management strategies. The rules are based on variables describing the characteristics of the landowners and their forest holdings. The predictive ability of a genetic algorithm is compared to linear regression analysis using identical data sets. The data are cross-validated seven times applying both genetic algorithm and regression analyses in order to examine the data-sensitivity and robustness of the generated models. The optimal rule set derived from genetic algorithm analyses included the following variables: mean initial volume, landowner's positive price expectations for the next eight years, landowner being classified as farmer, and preference for the recreational use of forest property. When tested with previously unseen test data, the optimal rule set resulted in a relative root mean square error of 0.40. In the regression analyses, the optimal regression equation consisted of the following variables: mean initial volume, proportion of forestry income, intention to cut extensively in future, and positive price expectations for the next two years. The R2 of the optimal regression equation was 0.34 and the relative root mean square error obtained from the test data was 0.38. In both models, mean initial volume and positive stumpage price expectations were entered as significant predictors of potential cut of preferred timber management strategy. When tested with the complete data set of 201 observations, both the optimal rule set and the optimal regression model achieved the same level of accuracy.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

A sensitive framework has been developed for modelling young radiata pine survival, its growth and its size class distribution, from time of planting to age 5 or 6 years. The data and analysis refer to the Central North Island region of New Zealand. The survival function is derived from a Weibull probability density function, to reflect diminishing mortality with the passage of time in young stands. An anamorphic family of trends was used, as very little between-tree competition can be expected in young stands. An exponential height function was found to fit best the lower portion of its sigmoid form. The most appropriate basal area/ha exponential function included an allometric adjustment which resulted in compatible mean height and basal area/ha models. Each of these equations successfully represented the effects of several establishment practices by making coefficients linear functions of site factors, management activities and their interactions. Height and diameter distribution modelling techniques that ensured compatibility with stand values were employed to represent the effects of management practices on crop variation. Model parameters for this research were estimated using data from site preparation experiments in the region and were tested with some independent data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Tasaikäisen metsän alle muodostuvilla alikasvoksilla on merkitystä puunkorjuun, metsänuudistamisen, näkemä-ja maisema-analyysien sekä biodiversiteetin ja hiilitaseen arvioinnin kannalta. Ilma-aluksista tehtävä laserkeilaus on osoittautunut tehokkaaksi kaukokartoitusmenetelmäksi varttuneiden puustojen mittauksessa. Laserkeilauksen käyttöönotto operatiivisessa metsäsuunnittelussa mahdollistaa aiempaa tarkemman tiedon tuottamisen alikasvoksista, mikäli alikasvoksen ominaisuuksia voidaan tulkita laseraineistoista. Tässä työssä käytettiin tarkasti mitattuja maastokoealoja ja kaikulaserkeilausaineistoja (discrete return LiDAR) usealta vuodelta (1–2 km lentokorkeus, 0,9–9,7 pulssia m-2). Laserkeilausaineistot oli hankittu Optech ALTM3100 ja Leica ALS50-II sensoreilla. Koealat edustavat suomalaisia tasaikäisiä männiköitä eri kehitysvaiheissa. Tutkimuskysymykset olivat: 1) Minkälainen on alikasvoksesta saatu lasersignaali yksittäisen pulssin tasolla ja mitkä tekijät signaaliin vaikuttavat? 2) Mikä on käytännön sovelluksissa hyödynnettävien aluepohjaisten laserpiirteiden selitysvoima alikasvospuuston ominaisuuksien ennustamisessa? Erityisesti haluttiin selvittää, miten laserpulssin energiahäviöt ylempiin latvuskerroksiin vaikuttavat saatuun signaaliin, ja voidaanko laserkaikujen intensiteetille tehdä energiahäviöiden korjaus. Puulajien väliset erot laserkaiun intensiteetissä olivat pieniä ja vaihtelivat keilauksesta toiseen. Intensiteetin käyttömahdollisuudet alikasvoksen puulajin tulkinnassa ovat siten hyvin rajoittuneet. Energiahäviöt ylempiin latvuskerroksiin aiheuttivat alikasvoksesta saatuun lasersignaaliin kohinaa. Energiahäviöiden korjaus tehtiin alikasvoksesta saaduille laserpulssin 2. ja 3. kaiuille. Korjauksen avulla pystyttiin pienentämään kohteen sisäistä intensiteetin hajontaa ja parantamaan kohteiden luokittelutarkkuutta alikasvoskerroksessa. Käytettäessä 2. kaikuja oikeinluokitusprosentti luokituksessa maan ja yleisimmän puulajin välillä oli ennen korjausta 49,2–54,9 % ja korjauksen jälkeen 57,3–62,0 %. Vastaavat kappa-arvot olivat 0,03–0,13 ja 0,10–0,22. Tärkein energiahäviöitä selittävä tekijä oli pulssista saatujen aikaisempien kaikujen intensiteetti, mutta hieman merkitystä oli myös pulssin leikkausgeometrialla ylemmän latvuskerroksen puiden kanssa. Myös 3. kaiuilla luokitustarkkuus parani. Puulajien välillä havaittiin eroja siinä, kuinka herkästi ne tuottavat kaiun laserpulssin osuessa puuhun. Kuusi tuotti kaiun suuremmalla todennäköisyydellä kuin lehtipuut. Erityisen selvä tämä ero oli pulsseilla, joissa oli energiahäviöitä. Laserkaikujen korkeusjakaumapiirteet voivat siten olla riippuvaisia puulajista. Sensorien välillä havaittiin selviä eroja intensiteettijakaumissa, mikä vaikeuttaa eri sensoreilla hankittujen aineistojen yhdistämistä. Myös kaiun todennäköisyydet erosivat jonkin verran sensorien välillä, mikä aiheutti pieniä eroavaisuuksia kaikujen korkeusjakaumiin. Aluepohjaisista laserpiirteistä löydettiin alikasvoksen runkolukua ja keskipituutta hyvin selittäviä piirteitä, kun rajoitettiin tarkastelu yli 1 m pituisiin puihin. Piirteiden selitysvoima oli parempi runkoluvulle kuin keskipituudelle. Selitysvoima ei merkittävästi alentunut pulssitiheyden pienentyessä, mikä on hyvä asia käytännön sovelluksia ajatellen. Lehtipuun osuutta ei pystytty selittämään. Tulosten perusteella kaikulaserkeilausta voi olla mahdollista hyödyntää esimerkiksi ennakkoraivaustarpeen arvioinnissa. Sen sijaan alikasvoksen tarkempi luokittelu (esim. puulajitulkinta) voi olla vaikeaa. Kaikkein pienimpiä alikasvospuita ei pystytä havaitsemaan. Lisää tutkimuksia tarvitaan tulosten yleistämiseksi erilaisiin metsiköihin.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Gaussian Processes (GPs) are promising Bayesian methods for classification and regression problems. They have also been used for semi-supervised learning tasks. In this paper, we propose a new algorithm for solving semi-supervised binary classification problem using sparse GP regression (GPR) models. It is closely related to semi-supervised learning based on support vector regression (SVR) and maximum margin clustering. The proposed algorithm is simple and easy to implement. It gives a sparse solution directly unlike the SVR based algorithm. Also, the hyperparameters are estimated easily without resorting to expensive cross-validation technique. Use of sparse GPR model helps in making the proposed algorithm scalable. Preliminary results on synthetic and real-world data sets demonstrate the efficacy of the new algorithm.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

An algorithm to generate a minimal spanning tree is presented when the nodes with their coordinates in some m-dimensional Euclidean space and the corresponding metric are given. This algorithm is tested on manually generated data sets. The worst case time complexity of this algorithm is O(n log2n) for a collection of n data samples.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (ˆfCLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that ˆfCLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The K-means algorithm for clustering is very much dependent on the initial seed values. We use a genetic algorithm to find a near-optimal partitioning of the given data set by selecting proper initial seed values in the K-means algorithm. Results obtained are very encouraging and in most of the cases, on data sets having well separated clusters, the proposed scheme reached a global minimum.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our study concerns an important current problem, that of diffusion of information in social networks. This problem has received significant attention from the Internet research community in the recent times, driven by many potential applications such as viral marketing and sales promotions. In this paper, we focus on the target set selection problem, which involves discovering a small subset of influential players in a given social network, to perform a certain task of information diffusion. The target set selection problem manifests in two forms: 1) top-k nodes problem and 2) lambda-coverage problem. In the top-k nodes problem, we are required to find a set of k key nodes that would maximize the number of nodes being influenced in the network. The lambda-coverage problem is concerned with finding a set of k key nodes having minimal size that can influence a given percentage lambda of the nodes in the entire network. We propose a new way of solving these problems using the concept of Shapley value which is a well known solution concept in cooperative game theory. Our approach leads to algorithms which we call the ShaPley value-based Influential Nodes (SPINs) algorithms for solving the top-k nodes problem and the lambda-coverage problem. We compare the performance of the proposed SPIN algorithms with well known algorithms in the literature. Through extensive experimentation on four synthetically generated random graphs and six real-world data sets (Celegans, Jazz, NIPS coauthorship data set, Netscience data set, High-Energy Physics data set, and Political Books data set), we show that the proposed SPIN approach is more powerful and computationally efficient. Note to Practitioners-In recent times, social networks have received a high level of attention due to their proven ability in improving the performance of web search, recommendations in collaborative filtering systems, spreading a technology in the market using viral marketing techniques, etc. It is well known that the interpersonal relationships (or ties or links) between individuals cause change or improvement in the social system because the decisions made by individuals are influenced heavily by the behavior of their neighbors. An interesting and key problem in social networks is to discover the most influential nodes in the social network which can influence other nodes in the social network in a strong and deep way. This problem is called the target set selection problem and has two variants: 1) the top-k nodes problem, where we are required to identify a set of k influential nodes that maximize the number of nodes being influenced in the network and 2) the lambda-coverage problem which involves finding a set of influential nodes having minimum size that can influence a given percentage lambda of the nodes in the entire network. There are many existing algorithms in the literature for solving these problems. In this paper, we propose a new algorithm which is based on a novel interpretation of information diffusion in a social network as a cooperative game. Using this analogy, we develop an algorithm based on the Shapley value of the underlying cooperative game. The proposed algorithm outperforms the existing algorithms in terms of generality or computational complexity or both. Our results are validated through extensive experimentation on both synthetically generated and real-world data sets.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The lifestyles of people living in single-family housing areas on the outskirts of the Greater Helsinki Region (GHR) are different from those living in inner city area. The urban structure of the GHR is concentrated in the capital on the one hand, and spread out across the outskirts on the other. Socioeconomic spatial divisions are evident as well-paid and educated residents move to the inner city or the single-family house dominated suburban neighbourhoods depending on their housing preferences and life situations. The following thesis explores how these lifestyles have emerged through the housing choices and daily mobility of the residents living in the new single-family housing areas on the outskirts of the GHR and the inner city. The study shows that, when it comes to lifestyles, residents on the outskirts of the region have different housing preferences and daily mobility patterns when compared with their inner city counterparts. Based on five different case study areas my results show that these differences are related to residents values, preferences and attitudes towards the neighbourhood, on the one hand, and limited by urban structure on the other. This also confirms earlier theoretical analyses and findings from the GHR. Residents who moved to the outskirts of Greater Helsinki Region and the apartment buildings of the inner city were similar in the basic elements of their housing preferences: they sought a safe and peaceful neighbourhood close to the natural environment. However, where housing choices, daily mobility and activities vary different lifestyles develop in both the outskirts and the inner city. More specifically, lifestyles in the city apartment blocks were inherently urban. Liveliness and highest order facilities were appreciated and daily mobility patterns were supported by diverse modes of transportation for the purposes of work, shopping and leisure time. On the outskirts, by contrast, lifestyles were largely post-suburban and child-friendliness appreciated. Due to the heterachical urban structure, daily mobility was more car-dependent since work, shopping and free time activities of the residents are more spread around the region. The urban structure frames the daily mobility on the outskirts of the region, but this is not to say that short local trips replace longer regional ones. This comparative case study was carried out in the single-family housing areas of Sundsberg in Kirkkonummi, Landbo in Helsinki and Ylästö in Vantaa, as well as in the inner city apartment building areas of Punavuori and Katajanokka in Helsinki. The data is comprised of residential surveys, interviews, and statistics and GIS data sets that illustrate regional daily mobility, socio-economic structure and vis-à-vis housing stock.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

The modularity of the supramolecular synthon is used to obtain transferability of charge density derived multipolar parameters for structural fragments, thus creating an opportunity to derive charge density maps for new compounds. On the basis of high resolution X-ray diffraction data obtained at 100 K for three compounds methoxybenzoic acid, acetanilide, and 4-methyl-benzoic acid, multipole parameters for O-H center dot center dot center dot O carboxylic acid dimer and N-H center dot center dot center dot O amide infinite chain synthon fragments have been derived. The robustness associated with these supramolecular synthons has been used to model charge density derived multipolar parameters for 4-(acetylamino)benzoic acid and 4-methylacetanilide. The study provides pointers to the design and fabrication of a synthon library of high resolution X-ray diffraction data sets. It has been demonstrated that the derived charge density features can be exploited in both intra- and intermolecular space for any organic compound based on transferability of multipole parameters. The supramolecular synthon based fragments approach (SBFA) has been compared with experimental charge density data to check the reliability of use of this methodology for transferring charge density derived multipole parameters.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Our ability to infer the protein quaternary structure automatically from atom and lattice information is inadequate, especially for weak complexes, and heteromeric quaternary structures. Several approaches exist, but they have limited performance. Here, we present a new scheme to infer protein quaternary structure from lattice and protein information, with all-around coverage for strong, weak and very weak affinity homomeric and heteromeric complexes. The scheme combines naive Bayes classifier and point group symmetry under Boolean framework to detect quaternary structures in crystal lattice. It consistently produces >= 90% coverage across diverse benchmarking data sets, including a notably superior 95% coverage for recognition heteromeric complexes, compared with 53% on the same data set by current state-of-the-art method. The detailed study of a limited number of prediction-failed cases offers interesting insights into the intriguing nature of protein contacts in lattice. The findings have implications for accurate inference of quaternary states of proteins, especially weak affinity complexes.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Ligand-induced conformational changes in proteins are of immense functional relevance. It is a major challenge to elucidate the network of amino acids that are responsible for the percolation of ligand-induced conformational changes to distal regions in the protein from a global perspective. Functionally important subtle conformational changes (at the level of side-chain noncovalent interactions) upon ligand binding or as a result of environmental variations are also elusive in conventional studies such as those using root-mean-square deviations (r.m.s.d.s). In this article, the network representation of protein structures and their analyses provides an efficient tool to capture these variations (both drastic and subtle) in atomistic detail in a global milieu. A generalized graph theoretical metric, using network parameters such as cliques and/or communities, is used to determine similarities or differences between structures in a rigorous manner. The ligand-induced global rewiring in the protein structures is also quantified in terms of network parameters. Thus, a judicious use of graph theory in the context of protein structures can provide meaningful insights into global structural reorganizations upon perturbation and can also be helpful for rigorous structural comparison. Data sets for the present study include high-resolution crystal structures of serine proteases from the S1A family and are probed to quantify the ligand-induced subtle structural variations.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Combinatorial exchanges are double sided marketplaces with multiple sellers and multiple buyers trading with the help of combinatorial bids. The allocation and other associated problems in such exchanges are known to be among the hardest to solve among all economic mechanisms. It has been shown that the problems of surplus maximization or volume maximization in combinatorial exchanges are inapproximable even with free disposal. In this paper, the surplus maximization problem is formulated as an integer linear programming problem and we propose a Lagrangian relaxation based heuristic to find a near optimal solution. We develop computationally efficient tâtonnement mechanisms for clearing combinatorial exchanges where the Lagrangian multipliers can be interpreted as the prices of the items set by the exchange in each iteration. Our mechanisms satisfy Individual-rationality and Budget-nonnegativity properties. The computational experiments performed on representative data sets show that the proposed heuristic produces a feasible solution with negligible optimality gap.