Introduction

In this guide, we’ll walk you through the installation of the BioLizardStyleR package and provide a brief overview of its functions using a practical example. The BioLizardStyleR package helps you make ggplot2 plots in the BioLizard style. It includes an integrated font installer, a dedicated ggplot2 theme, custom color functions, and a tool to add a BioLizard footer and export the plot.

Rmd CSS style guide

The box below shows how to BioLizard-ify a Rmd knit to an HTML using a CSS style file. In the YAML header, you can set output to html_document with two options: toc: true and toc_float: true.

@import url('https://fonts.googleapis.com/css?family=Red+Hat+Display');

/*** default styling ***/
html, body, h1, h2, h3, h4, h5, h6 {
font-family: 'Red Hat Display', sans-serif;
}

div.main-container {
background-color: white;
max-width: 1200px !important;
padding: 40px;
margin-top: 50px;
margin-bottom: 50px;
border-radius: 15px;
box-shadow: 0px 10px 30px rgba(0, 0, 0, 0.2);
}

h4.author em {
font-size: 1rem;
margin-top: 2rem;
font-weight: 600;
color: #000000;
}

h4.date em {
color: #000000;
font-size: 1rem;
}

h1:not(.title) {
color: #009944;
}

h2 {
color: #0d47a1;
}

h3 {
color: #FF6F59;
}

body {
background: linear-gradient(180deg, #faf4ed 0%, #badeed 100%);
background-attachment: scroll;
}

1. Installation

To install the BioLizardStyleR package directly from GitHub, you will need to utilize the devtools package. The package requires our signature font “Red Hat Display”. If the font is not yet installed, installation will be attempted automatically when you load the library.

if (!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}

# make sure to look up BioConductor packages (used in the vignette)
setRepositories(ind = c(1:6, 8))

devtools::install_github("lizard-bio/nature-grade-visualization-playground", subdir="BioLizardStyleR")

For building the vignette, you also need the pasilla dataset. In case of issues with the installation of this package, you might have to install libbz2 (libbz2-dev (deb), libbz2-devel (rpm), bzip2 (brew)).

library(BioLizardStyleR)
#> Loading required package: ggplot2
#> Red Hat Display loaded via Google Fonts. Ready for BLZ style!

If automatic font installation fails on your system, you can install the font manually using the following steps:

Download Fonts: Download the font files from the GitHub repository Nature Grade Visualization Playground.
Install Fonts: Install the downloaded fonts on your system with your favorite font manager. Double-clicking the .ttf file will already open your font manager in many cases.

2. Crafting a BioLizard-Styled ggplot

Transforming your ggplot to match the BioLizard style involves a three-step process. Let’s illustrate this on a basic ggplot.

library(ggplot2)
data("mtcars")
mtcars$gear <- factor(mtcars$gear, levels = c(3, 4, 5), ordered = TRUE)

testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) + 
  ggplot2::geom_point(ggplot2::aes(color = gear),size=3) +  
  ggplot2::labs(title = "Miles per Gallon vs. Horsepower",  
                 x = "Horsepower",  
                 y = "Miles per Gallon",  
                 color = "Gears")
testplot

2.1 Applying the Biolizard theme

In the initial step, we implement the biolizard theme through the lizard_style() function. This not only modifies various theme settings but also adjusts the font. An important note here is that the lizard theme can only act as a starting template! Depending on your specific plot, title length, variables in use, and other factors, you might need to make further theme adjustments.

testplot <- testplot + lizard_style()
testplot

Tip: you can use theme_set(lizard_style()) at the beginning of your script or report to set the theme to lizard_style() in all plots, without having to add + lizard_style() to each plot individually. Te reset the theme to the ggplot2 default, you can call reset_theme_settings().

2.2 Adding a BioLizard color palette

Once you’ve styled your plot, you can enhance it with a BioLizard color palette. You have the option to choose from qualitative, sequential, or divergent palettes tailored for either discrete or continuous variables. These palettes incorporate the distinct hues of the BioLizard house colors, while ensuring they are color-blind friendly and perceptually uniform. For a deeper understanding of each palette, it’s recommended to consult the documentation for the respective function.

The coloring functions within the package adhere to a flexible structure: scale_color/fill_biolizard. The type (discrete/continuous) and the scheme must be determined as function parameters.

There are 7 available schemes:

qualitative: [discrete data only] Colorblind-safe, qualitative color palette featuring the 8 saturated colors from the BioLizard brand book.
paired: [discrete data only] Colorblind-safe qualitative color palette featuring the colors in the Biolizard brand book, especially suited for levels that are related 2-by-2.
hues: [discrete data only] Maps each level to an evenly spaced hue on the color wheel, starting with the dark green from the brand book. DOES NOT generate colorblind-safe palettes.
sequential: Sequential, colorblind-safe and perceptually uniform color palette inspired by the light and darker shades of blue in the BioLizard brand book.
divergent: Divergent, colorblind-safe and perceptually uniform (within each branch) color palette inspired by the green-blue color gradient in the biolizard brand book.
beige_blue: Continuous, perceptually uniform, color-blind friendly color scale inspired by the beige and blue colors in the brand book.
beige_gn_blue: Continuous, perceptually uniform, color-blind friendly color scale inspired by the beige, green and blue colors in the brand book.

data("mtcars")
mtcars$gear <- as.factor(mtcars$gear)

# Base plot
testplot <- ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
  ggplot2::geom_point(aes(color = gear), size=3) +
  ggplot2::labs(
    title = "Miles per Gallon vs. Horsepower",
    x = "Horsepower",
    y = "Miles per Gallon",
    color = "Gears"
  )

# Applying the qualitative color scale
testplot_qualitative <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "qualitative") + 
  lizard_style()
testplot_qualitative


# Applying the sequential color scales (assuming you have a different dataset or aesthetic)
testplot_sequential <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "sequential") + 
  lizard_style()
testplot_sequential


## sequential scale inspired by BioLizard beige and blue (assuming you have a different dataset or aesthetic)
testplot_sequential <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "beige_blue") + 
  lizard_style()
testplot_sequential


# Applying the divergent color scale (assuming you have a different dataset or aesthetic)
testplot_divergent <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "divergent") + 
  lizard_style()
testplot_divergent

It is possible to reverse the color scales by setting reverse = TRUE

testplot_divergent <- testplot + 
  scale_color_biolizard(type = "discrete", scheme = "divergent", reverse = TRUE) +
  lizard_style()
testplot_divergent

The discrete color palettes can also be called directly, in case you would like to use them outside ggplot.

#use color palette
mycolors <- biolizard_pal_sequential(3, reverse = TRUE)
gearcolors <- mycolors[as.factor(mtcars$gear)]

graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Red Hat Display")
graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")

# #use BLZ base colors
# mycolors <- c(blz_green, blz_blue, blz_yellow)
# gearcolors <- mycolors[as.factor(mtcars$gear)]
# 
# graphics::plot(x = mtcars$hp, y = mtcars$mpg, pch = 19, col = gearcolors, family = "Red Hat Display")
# graphics::legend("topright", legend = paste("Gears", 3:5), col = mycolors, pch = 19, bty = "n")

2.3 Adding a BioLizard footer and exporting the plot.

For a unique touch on your visualizations, integrate a BioLizard footer using the finalise_lizardplot() function. This function not only positions the logo neatly at the bottom right but also provides an option to specify your data source directly in the footer, offering an alternative to the caption section of ggplot. When ready to export, either assign a name to your plot using output_name or specify a directory with save_filepath. The device argument allows to choose from the file types “png” (default), “pdf”, “svg”, or “ps”.

finalise_lizardplot(
  testplot + 
    scale_color_biolizard(type = "discrete", scheme = "qualitative") + 
    lizard_style(),
  source='Source: The mtcars dataset')
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead

3. Examples

3.1 Simple bar chart

x = 1:8
y = 1:8

df <- data.frame(x=x, y=y)
ggplot2::ggplot(df, ggplot2::aes(x=x, y=y, fill = factor(x))) +
  ggplot2::geom_col() +
  scale_fill_biolizard() +
  lizard_style()

3.2 Simple polygon plot

ids <- factor(c("1.1", "2.1", "1.2", "2.2", "1.3", "2.3"))
values <- data.frame(
  id = ids,
  value = c(3, 3.1, 3.1, 3.2, 3.15, 3.5)
)
positions <- data.frame(
  id = rep(ids, each = 4),
  x = c(2, 1, 1.1, 2.2, 1, 0, 0.3, 1.1, 2.2, 1.1, 1.2, 2.5, 1.1, 0.3,
        0.5, 1.2, 2.5, 1.2, 1.3, 2.7, 1.2, 0.5, 0.6, 1.3),
  y = c(-0.5, 0, 1, 0.5, 0, 0.5, 1.5, 1, 0.5, 1, 2.1, 1.7, 1, 1.5,
        2.2, 2.1, 1.7, 2.1, 3.2, 2.8, 2.1, 2.2, 3.3, 3.2)
)

datapoly <- merge(values, positions, by = c("id"))

ggplot2::ggplot(datapoly, ggplot2::aes(x = x, y = y)) +
  ggplot2::geom_polygon(ggplot2::aes(group = id, fill = id)) +
  scale_fill_biolizard(type = "discrete", scheme = "beige_gn_blue") +
  lizard_style()

3.3 Simple box plot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
  ggplot2::geom_boxplot() +
  ggplot2::labs(
    title = "Miles per Gallon vs. Gears",
    x = "Gears",
    y = "Miles per Gallon"
  ) +
  lizard_style()

A violin plot is similar to a boxplot, but captures more information on the data distribution:

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = as.factor(gear), y = mpg)) +
  ggplot2::geom_violin() +
  ggplot2::labs(
    title = "Miles per Gallon vs. Gears",
    x = "Gears",
    y = "Miles per Gallon"
  ) +
  lizard_style()

3.4 Simple density plot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = mpg)) +
  ggplot2::geom_density() +
  lizard_style()

3.5 Simple scatterplot

ggplot2::ggplot(data = mtcars, ggplot2::aes(x = hp, y = mpg)) +
  ggplot2::geom_smooth() +
  ggplot2::geom_point(ggplot2::aes(color= mpg)) +
  ggplot2::labs(
    title = "Miles per Gallon vs. horsepower",
    x = "Horsepower",
    y = "Miles per Gallon",
    color = "Miles per Gallon",
    fill = "Miles per Gallon"
  ) +
  scale_color_biolizard(type = "continuous", scheme = "sequential") +
  lizard_style()
#> `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

3.6 Facets

Faceting is a great way to lay out related plots side-by-side, with the axes either on the same scale or on different scales.

# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  ggplot2::facet_wrap(ggplot2::vars(gear)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))


# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  ggplot2::facet_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))

Note that the minimal style makes it difficult to distinguish the different blocks from each other. Readability of the figure is improved by plotting the axes for each plot, but they are removed by facet_grid and facet_wrap when the plots are on the same scale. The similar functions facet_rep_grid and facet_rep_wrap from the lemon package are more suitable with the lizard style, since they will show all axes.

library("lemon")

# use facet_wrap to stratify plots by one variable
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  lemon::facet_rep_wrap(ggplot2::vars(gear)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))


# use facet_grid for two variables
ggplot2::ggplot(mtcars, ggplot2::aes(x=mpg, y=hp)) +
  ggplot2::geom_point() +
  lemon::facet_rep_grid(rows = ggplot2::vars(gear), cols = ggplot2::vars(carb)) +
  lizard_style() +
  ggplot2::theme(axis.text.x = ggplot2::element_text(size = 8, angle = 90, vjust = 0.5))

3.7 Treemap plots

Treemap plots are a powerful way of visualizing hierarchical data such as GO terms, for example. The size of the rectangles corresponds to the value you want to show (e.g. p value), and related rectangles are grouped together. This example makes use of the Pokémon dataset from the highcharter package and is based on https://yjunechoe.github.io/posts/2020-06-30-treemap-with-ggplot/.

In this example, more colors are requested than are present in the qualitative color scales. The biolizard_pal_hue palette was developped for situations like this. It is a specific version of the scales::pal_hue function which is used by default by ggplot2, that starts at Biolizard’s signature green.

The treemapify package allows plotting treemaps with ggplot2.

library(treemapify)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

data("pokemon", package = "highcharter")
# Cleaning up data for a treemap
data <- pokemon |>
  dplyr::select(pokemon, type_1, type_2) |>
  dplyr::mutate(type_2 = ifelse(is.na(type_2), paste("only", type_1), type_2)) |> 
  dplyr::group_by(type_1, type_2) |>
  dplyr::summarise(n = length(pokemon)) |> 
  dplyr::ungroup()
#> `summarise()` has grouped output by 'type_1'. You can override using the
#> `.groups` argument.

head(data)
#> # A tibble: 6 × 3
#>   type_1 type_2       n
#>   <chr>  <chr>    <int>
#> 1 bug    electric     4
#> 2 bug    fairy        2
#> 3 bug    fighting     3
#> 4 bug    fire         2
#> 5 bug    flying      13
#> 6 bug    ghost        1

# plot the treemap
ggplot2::ggplot(data, ggplot2::aes(area = n, label = type_2, fill = type_1,
                                   subgroup = type_1)) +
  ggplot2::ggtitle("Number of pokémon stratified by type") +
  # 1. Draw type_2 borders and fill colors
  treemapify::geom_treemap() +
  # 2. Draw type_1 borders
  treemapify::geom_treemap_subgroup_border() +
  # 3. Print type_1 text
  treemapify::geom_treemap_subgroup_text(place = "centre", grow = T, alpha = 0.5, colour = "black",
                                         fontface = "italic", min.size = 0,
                                         family = "Red Hat Display") +   #need to add "Red Hat Display" manually as this text is not accessed in theme()
  # 4. Print type_2 text
  treemapify::geom_treemap_text(colour = "white", place = "topleft", reflow = T,
                                family = "Red Hat Display") +   #need to add "Red Hat Display" manually as this text is not accessed in theme()
  
  # 5. BLZ style
  scale_fill_biolizard(scheme = "beige_gn_blue") +
  lizard_style() +
  ggplot2::theme(legend.position = "none")

However, this is not ideal, since the font of the text still has to be defined manually, and thus the lizard_style() function does not work very well for these treemaps, even though they are ggplot2-based. Another option is to use the ‘treemap’ library, and move away from ggplot2 altogether, which generally leads to more aesthetically pleasing plots:

library(treemap)

# get colors from the biolizard hues palette
ncols <- length(unique(data$type_1))
palette <- biolizard_pal_beige_gn_blue(ncols)

treemap::treemap(data, index=c("type_1", "type_2"), vSize="n", type="index",
                 title="Number of pokémon stratified by type", 
                 palette=palette,  #set the BLZ color palette
                 fontcolor.labels=c("#FFFFFFDD", "#00000080"), bg.labels=0, border.col="#00000080",
                 fontfamily.title = "Red Hat Display", fontfamily.labels = "Red Hat Display", fontfamily.legend = "Red Hat Display")  #set the Red Hat Display font

3.8 Chicago plots

Idea from “A ggplot2 Tutorial for Beautiful Plotting in R”.

example_file <- system.file("extdata", "chicago-nmmaps-custom.csv", package = "BioLizardStyleR")
chic <- utils::read.csv(example_file) #An example dataset

g <- ggplot2::ggplot(chic, ggplot2::aes(x = season, y = o3,
                                        color = season)) +
  ggplot2::labs(x = "Season", y = "Ozone") + 
  ggplot2::geom_violin(linewidth = 1, alpha = .5) +
  ggplot2::geom_jitter(alpha = .25, width = .3) +
  ggplot2::coord_flip()

g + lizard_style() +
  scale_color_biolizard()

library(corrr)
library(forcats)

corm <-
  chic |> 
  dplyr::select(temp, dewpoint, pm10, o3) |> 
  corrr::correlate(diagonal = 1) |> 
  corrr::shave(upper = FALSE)
#> Correlation computed with
#> • Method: 'pearson'
#> • Missing treated using: 'pairwise.complete.obs'

corm <- corm |> 
  tidyr::pivot_longer(
    cols = -term,
    names_to = "colname",
    values_to = "corr"
  ) |> 
  dplyr::mutate(
    rowname = forcats::fct_inorder(term),
    colname = forcats::fct_inorder(colname),
    label = dplyr::if_else(is.na(corr), "", sprintf("%1.2f", corr))
  )

g <- ggplot2::ggplot(corm, ggplot2::aes(rowname, fct_rev(colname),
                                        fill = corr)) +
  ggplot2::geom_tile() +
  ggplot2::geom_text(ggplot2::aes(label = label)) +
  ggplot2::coord_fixed(expand = FALSE) +
  ggplot2::labs(x = NULL, y = NULL) 

g + scale_fill_biolizard(type='continuous',scheme='divergent',limits = c(-1, 1), 
                         na.value = "white", name="Pearson\nCorrelation") +
  lizard_style() +
  ggplot2::theme(legend.position = "inside", legend.position.inside = c(.95, .8))

g <- ggplot2::ggplot(chic, ggplot2::aes(temp, o3)) +
  ggplot2::geom_hex(color = "grey") +
  ggplot2::labs(x = "Temperature (°F)", 
                y = "Ozone Level", 
                title = 'How Temperature Affects Ozone Levels',
                subtitle ='Hexagonal Binning of Ozone Levels vs. Temperature in °F')

g <- g + lizard_style() +
  scale_fill_biolizard(type = 'continuous',scheme = 'beige_blue')
g


finalise_lizardplot(g,source="Source: The Chicago dataset (https://www.cedricscherer.com/2019/08/05/a-ggplot2-tutorial-for-beautiful-plotting-in-r/)")
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead
#> Warning in grid.Call.graphics(C_text, as.graphicsAnnot(x$label), x$x, x$y, :
#> font family 'Nunito Sans 10pt Medium' not found, will use 'sans' instead

3.9 Interactive plots (DE analysis)

ggplotly can be used to make a ggplot interactive, for example to use in a html report. See an example of in interactive PCA plot and volcano plot below. To generate these plots, we used the per-gene read counts of the pasilla dataset. Make sure that you do not apply lizard_style() to the ggplot call when you want to use ggplotly. The function lizard_layout() applies the lizard theme to a plotly object.

Note: to succesfully install Pasilla, you might have to manually install libbz2 (libbz2-dev (deb), libbz2-devel (rpm), bzip2 (brew)).

library(pasilla)
#> Loading required package: DEXSeq
#> Loading required package: BiocParallel
#> Loading required package: Biobase
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: 'generics'
#> The following object is masked from 'package:dplyr':
#> 
#>     explain
#> The following objects are masked from 'package:base':
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: 'BiocGenerics'
#> The following object is masked from 'package:dplyr':
#> 
#>     combine
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, is.unsorted, lapply, Map, mapply, match, mget,
#>     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
#>     rbind, Reduce, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'matrixStats'
#> The following objects are masked from 'package:Biobase':
#> 
#>     anyMissing, rowMedians
#> The following object is masked from 'package:dplyr':
#> 
#>     count
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> The following object is masked from 'package:Biobase':
#> 
#>     rowMedians
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following objects are masked from 'package:dplyr':
#> 
#>     first, rename
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> 
#> Attaching package: 'IRanges'
#> The following objects are masked from 'package:dplyr':
#> 
#>     collapse, desc, slice
#> Loading required package: GenomeInfoDb
#> Loading required package: DESeq2
#> Loading required package: AnnotationDbi
#> 
#> Attaching package: 'AnnotationDbi'
#> The following object is masked from 'package:dplyr':
#> 
#>     select
#> Loading required package: RColorBrewer
library(plotly)
#> 
#> Attaching package: 'plotly'
#> The following object is masked from 'package:AnnotationDbi':
#> 
#>     select
#> The following object is masked from 'package:IRanges':
#> 
#>     slice
#> The following object is masked from 'package:S4Vectors':
#> 
#>     rename
#> The following object is masked from 'package:ggplot2':
#> 
#>     last_plot
#> The following object is masked from 'package:stats':
#> 
#>     filter
#> The following object is masked from 'package:graphics':
#> 
#>     layout

# get gene counts
datafiles <- system.file("extdata", package="pasilla", mustWork=TRUE)
count.table <- utils::read.table(file.path(datafiles, "pasilla_gene_counts.tsv"), header=TRUE, row.names=1, quote="", comment.char="" )
head(count.table)
#>             untreated1 untreated2 untreated3 untreated4 treated1 treated2
#> FBgn0000003          0          0          0          0        0        0
#> FBgn0000008         92        161         76         70      140       88
#> FBgn0000014          5          1          0          0        4        0
#> FBgn0000015          0          2          1          2        1        0
#> FBgn0000017       4664       8714       3564       3150     6205     3072
#> FBgn0000018        583        761        245        310      722      299
#>             treated3
#> FBgn0000003        1
#> FBgn0000008       70
#> FBgn0000014        0
#> FBgn0000015        0
#> FBgn0000017     3334
#> FBgn0000018      308

# get metadata
SampleAnno <- utils::read.csv(file.path(datafiles, "pasilla_sample_annotation.csv"))
head(SampleAnno)
#>           file condition        type number.of.lanes total.number.of.reads
#> 1   treated1fb   treated single-read               5              35158667
#> 2   treated2fb   treated  paired-end               2         12242535 (x2)
#> 3   treated3fb   treated  paired-end               2         12443664 (x2)
#> 4 untreated1fb untreated single-read               2              17812866
#> 5 untreated2fb untreated single-read               6              34284521
#> 6 untreated3fb untreated  paired-end               2         10542625 (x2)
#>   exon.counts
#> 1    15679615
#> 2    15620018
#> 3    12733865
#> 4    14924838
#> 5    20764558
#> 6    10283129

# DE analysis with edgeR
library(edgeR)
#> Loading required package: limma
#> 
#> Attaching package: 'limma'
#> The following object is masked from 'package:DEXSeq':
#> 
#>     plotMA
#> The following object is masked from 'package:DESeq2':
#> 
#>     plotMA
#> The following object is masked from 'package:BiocGenerics':
#> 
#>     plotMA

Group <- as.factor(c(rep("untreated", 4), rep("treated", 3)))
y <- edgeR::DGEList(counts = count.table, group = Group)

# filter genes
keep <- edgeR::filterByExpr(y)
y <- y[keep, , keep.lib.sizes = FALSE]

# normalize counts
y <- edgeR::calcNormFactors(y)
norm.counts <- edgeR::cpm(y, normalized.lib.sizes = TRUE, log = T)

# plot interactive PCA
mds <- limma::plotMDS(y, gene.selection="common", plot = F)
toplot <- data.frame(Dim1 = mds$x, Dim2 = mds$y, Group = Group, sample = colnames(mds$distance.matrix.squared))
pcaPlot <- ggplot2::ggplot(toplot, ggplot2::aes(Dim1, Dim2, colour = Group)) +
  ggplot2::geom_point() + 
  ggplot2::geom_text(ggplot2::aes(label = sample), size = 3, nudge_y = 0.05) +
  ggplot2::coord_fixed() +
  ggplot2::xlab(paste0("PC1: ", round(mds$var.explained[1]*100), "% variance")) +
  ggplot2::ylab(paste0("PC2: ", round(mds$var.explained[2]*100), "% variance")) +
  scale_color_biolizard() 

plotly::ggplotly(pcaPlot) |> lizard_layout()


# The lizard style is added through lizard_layout() on plotly instead of lizard_style() in ggplot

# fit general linear model
design <- stats::model.matrix(~ 0 + Group)
colnames(design) <- levels(Group)
y <- edgeR::estimateDisp(y, design)
fit <- edgeR::glmQLFit(y, design)

# fit contrasts of interest
my.contrasts <- limma::makeContrasts(treatedVsUntreated = treated-untreated,
                                     levels=design)

lrt <- edgeR::glmQLFTest(fit, contrast = my.contrasts)

DE <- edgeR::topTags(lrt, n="all")$table
colnames(DE) <- c("log2FoldChange", "logCPM", "LR", "pvalue", "padj")
DE <- DE[order(DE$padj), ]
DE <- DE[!is.na(DE$padj), ]
head(DE)
#>             log2FoldChange   logCPM        LR       pvalue         padj
#> FBgn0039155      -4.605077 5.882320 1032.5624 3.519114e-11 2.786786e-07
#> FBgn0025111       2.906975 6.925429  718.6612 2.001949e-10 7.926717e-07
#> FBgn0029167      -2.189661 8.222828  527.6780 8.769068e-10 2.151196e-06
#> FBgn0035085      -2.549919 5.685708  504.5058 1.086600e-09 2.151196e-06
#> FBgn0003360      -3.171310 8.451316  467.1444 1.567356e-09 2.482378e-06
#> FBgn0039827      -4.149923 4.399320  412.0585 2.861632e-09 3.476204e-06

# interactive volcano plot
p <- DE |> 
  ggplot2::ggplot(ggplot2::aes(x = log2FoldChange, y = -log10(pvalue))) +
  ggplot2::geom_point(ggplot2::aes(color = padj < 0.05)) +
  scale_color_biolizard(reverse = TRUE)

plotly::ggplotly(p) |> lizard_layout()