A dispatch from MOSAiC Leg 4

PhD student Emelia Chamberlain sends the following dispatch from Polarstern.

Operating an international expedition in the remote central Arctic would always be a logistically taxing endeavor. Operating an international expedition in the remote central Arctic during a global pandemic is that much more challenging. But through the incredible perseverance of a delayed Leg 3 team and hard work from the dedicated logistical teams at the Alfred Wegner Institute in Germany, MOSAiC Leg 4 is underway!

Our NSF funded project work on MOSAiC will be carried out on this leg by URI Post-doc Alessandra D’Angelo and myself. In a reshuffling of plan operations, I found myself headed North almost a month earlier than expected (and for her almost a month later). On May 1, 2020, approx. one hundred scientists and crew began two weeks of quarantine in a local hotel in Bremerhaven. For the first week we were in total isolation within individual hotel rooms. Meals were brought to our door by the incredible hotel staff. Upon beginning our individualized quarantine, we took two tests, the first test on our arrival and the second one seven days later. We were all to remain in individualized quarantine until the results of our second coronavirus test came back.  Thankfully, everyone tested negative and after seven long days we entered Phase 2 – group quarantine. Even weeks and 3,000 km away from meeting with the Polarstern, MOSAiC Leg 4 had finally begun. Tentatively at first, we emerged from our rooms to gather for meals, planning, and perhaps most importantly, safety briefings. While we were able to be around each other, we still practiced strict social distancing precautions.

Here I play the hypothermia victim during an Arctic Safety training. During melt season, falling into the freezing Arctic waters poses one of the greatest dangers while working out on the ice.
On this leg, I will serve as a member of the BioGeoChemistry team – we are principally interested in studying how climate relevant gasses cycle through the central Arctic’s ocean-ice-atmosphere system. Pictured here is myself, URI post-doc and Leg 4 BGC team lead Alessandra D’Angelo, and U. Gronigen post-doc Deborah Bozzato. The fourth and final member of our team, Falk Pätzold, was already onboard Polarstern having also participated in Leg 3.

After another 10 days, and negative test results for all, we began our journey Northward on the R/V Maria S. Merian and R/V Sonne to meet with the MOSAiC workhorse, R/V Polarstern. While overall lovely research vessels, the MS Merian and Sonne are not icebreakers, meaning the Polarstern would need to travel south of the ice edge to refuel and make the personnel exchange. Therefore, all ships coordinated to meet in Adventfjorden, Svalbard. It required a lot of time, but it was a beautiful journey. Due to some unexpected ice-pressure preventing southerly travel we ended up reaching the designated meeting point (near Longyearbyen, Norway) almost 2 weeks earlier than Polarstern! I spent most of the time planning, catching up on some coding, and working out in the many group sporting activities being held on board, all in prep for the labor-intensive ice activities. (Zumba and yoga are of course perfect analogs for pulling sledges of equipment across slushy sea ice…)

Turns out, scientists have a lot of luggage – most of it being heavy scientific equipment. After several iterations of moving all of this gear on and off multiple vessels (busses, ships, etc.), we have the assembly line down pat.
“C is for container!” The container behind this massive pile of luggage was my home for most of my time on the Merian. Between the two ships there were not enough cabins for the entire Leg 4 team so a few of us were tucked into these cozy make-shift apartments. To commemorate the experience, we took a “container crew” photo during move out.
Polarstern arriving in Svalbard.

Once the Polarstern finally arrived in Adventfjorden, a flurry of handover activities commenced. We met with the Leg 3 team, explored the labs, reviewed protocols, and heard all about their experience on the ice. Prior to departing the MOSAiC ice floe, there was a dynamic shift in the ice leading to some sampling sites splitting from the rest. This, however, is not surprising given its fast trajectory south and the onset of the summer melt-season. The ice drift (which can be tracked here in real time) has brought the MOSAiC floe to approximately 82º latitude, air temperatures mostly remain above freezing, and surface water is currently measured at -1.7ºC (warm for under-ice). With polar summer in full swing and such exciting ice dynamics, I look forward to getting back to the floe and tracking the ecological changes through this transition! But first, we must get there. I’m hoping to utilize this next phase of transit to explore the R/V Polarstern. Since it will be my new home for the next ~4.5 months, I suppose I should learn how to find my way from the lab to the mess hall without getting lost…

Before the ships pulled side to side and deployed the gangplank, people were ferried from ship to ship by small boats in order to make the most out of the few days we had for knowledge tranfer and handover activities.
While checking out our lab onboard Polarstern, I ran into Igor, the lab’s totem. Thus far he has done a pretty good job of keeping the instruments in check and running, but I hope that he will be pleased by my bringing him a friend.
Leg 4 waves goodbye from a departing Polarstern to those Leg 3 participants traveling home onboard the Merian. Bye Svalbard, to the North!
Posted in MOSAiC | Leave a comment

Looking back to South Bay Salt Works 2019

Many thanks to Michelle Babcock, communications specialist for the OAST project, for producing this great video of our 2019 field effort at the South Bay Salt Works in San Diego. This field effort was supposed to be a low-key opportunity for the OAST team to practice working together in the field, but the ideal nature of the site and some of our preliminary findings have elevated it to a primary field site. It’s a good thing too… with COVID-19 making a hash of international travel plans it’s the one place we know we can reach!

Posted in Uncategorized | Leave a comment

Tutorial: SuperSOMS and an R script for detecting regions of interest

A common exercise in environmental microbiology is counting bacterial cells with an epifluorescent microscope. During my PhD I spend many hours hunched over a microscope in a darkened room, contemplating which points of light were bacteria (and should thus be counted) and which were not. With a large cup of coffee from Seattle’s U District Bean and Bagel and an episode of “This American Life” playing in the background it’s not a bad way to spend the day. But it definitely felt like a procedure that needed some technological enhancement. My biggest concerns were always objectivity and reproducibility; it’s really hard to determine consistently which “regions of interest” (ROIs) to count. There are of course commercial software packages for identifying and counting ROIs in a microscope image. But why pay big money for a software subscription when you can write your own script? I had some free time during our slow transit to Polarstern during MOSAiC Leg 3 and thought I’d give it a try. The following tutorial borrows heavily from the image segmentation example in Wherens and Kruisselbrink, 2018.

We start with a png image from a camera attached to a microscope. The green features are bacteria and phytoplankton that have been stained with Sybr Green. These are the ROIs that we’d like to identify and count. The image quality is actually pretty bad here; this was made with the epifluorescent scope at Palmer Station back in 2015, and the scope and camera needed some TLC. It turns out that I don’t actually have many such images on my laptop, and I can’t simply go and make a new one because we aren’t allowed in the lab right now! Nonetheless the quality is sufficient for our purposes.

First, we need to convert the image into an RGB matrix that we can work with. I’m sure there’s a way to do this in R, but I couldn’t find an expedient method. Python made it easy.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '15170245.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
    for i in range(0, img.shape[1]):
        for j in range(0, img.shape[0]):
            print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
            print(i + 1, j + 1, sep = ',', file = xy_out)

Next we break out R to do the actual analysis (which yes, could be done in Python…). The basic strategy is to use a self organizing map (SOM) with 2 layers. One layer will be color, the other will be position. We’ll use this information to identify distinct classes corresponding to diagnostic features of ROIs. Last, we’ll iterate across all the pixels that appear to belong to ROIs and attempt to draw an ellipse around each group of pixels that makes up a ROI. First, we read in the data produced by the Python script:

scope.rgb <- read.csv('15170245.rgb4r.csv', header = F)
scope.xy <- read.csv('15170245.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

Then we define a function to render the image described by these matrices:

plotImage <- function(scope.xy, scope.rgb){
  image.col <- rgb(scope.rgb[,"red"], scope.rgb[,"green"], scope.rgb[,"blue"], maxColorValue = 1)
  x.dim <- max(scope.xy$X)
  y.dim <- max(scope.xy$Y)
  
  temp.image <- 1:dim(scope.rgb)[1]
  dim(temp.image) <- c(y.dim, x.dim)
  image(0:x.dim,
        0:y.dim,
        t(temp.image),
        col = image.col,
        ylab = paste0('Y:', y.dim),
        xlab = paste0('X:', x.dim))
}

## give it a test

plotImage(scope.xy, scope.rgb)

You'll note that the function flips the image. While annoying, this doesn't matter at all for identifying ROIs. If it bothers you go ahead and tweak the function :). Now we need to train our SOM. The SOM is what does the heavy lifting of identifying different types of pixels in the image.

#### train som ####

som.grid <- somgrid(10, 10, 'hexagonal')
som.model <- supersom(list('rgb' = data.matrix(scope.rgb), 'coords' = data.matrix(scope.xy)), whatmap = c('rgb', 'coords'), user.weights = c(1, 9), grid = som.grid)

Now partition the SOM into k classes with k-means clustering. The value for k has to be determined experimentally but should be consistent for all the images in a set (i.e. a given type of microscopy image).

som.codes <- som.model$codes[[1]]
som.codes.k <- kmeans(som.codes, centers = 6)

## Get mean colors for clusters (classes)

class.col <- c()

for(k in 1:max(som.codes.k$cluster)){
  temp.codes <- som.codes[which(som.codes.k$cluster == k),]
  temp.col <- colMeans(temp.codes)
  temp.col <- rgb(temp.col[1], temp.col[2], temp.col[3], maxColorValue = 1)
  class.col <- append(class.col, temp.col)
}

## Make a plot of the som with the map units colored according to mean color
## of owning class.

plot(som.model,
     type = 'mapping',
     bg = class.col[som.codes.k$cluster],
     keepMargins = F,
     col = NA)

text(som.model$grid$pts, labels = som.codes.k$cluster)
add.cluster.boundaries(som.model, som.codes.k$cluster)

Here's where we have to be a bit subjective. We need to make an informed decision about which classes constitute ROIs. Looking at this map I'm gonna guess 3 and 6. The classes and structure of your map will of course be totally different, even if you start with the same training image. To make use of this information we first predict which classes our original pixels belong to.

## predict for RGB only

image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb')

Then we identify those pixels that below to the classes we think describe ROIs.

## select units that correspond to ROIs

target.units = which(som.codes.k$cluster %in% c(3, 6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

Now comes the tricky bit. Knowing which pixels belong to ROIs isn't actually that useful, as each ROI is composed of many different pixels. So we need to aggregate the pixels into ROIs. Again, this requires a little experimentation, but once you figure it out for a given sample type it should work consistently. The key parameter here is "resolution" which we define as how far apart two pixels of the same class need to be to constitute different ROIs. The value 20 seems to work reasonably well for this image.

## loop through all pixels.  if there's a pixel within n distance of it, check to
## see if that pixel belongs to an ROI.  If so, add the new pixel to that area.  If not,
## create a new ROI.  Currently a pixel can be "grabbed" by an ROI produced later.

findROI <- function(resolution = 20){
  roi <- 1
  roi.pixels <- as.data.frame(matrix(nrow = dim(target.pixels)[1], ncol = 3))
  colnames(aoi.pixels) <- c('X', 'Y', 'roi')
  
  for(i in 1:dim(target.pixels)[1]){
    
    if(is.na(roi.pixels[i, 'roi']) == T){
      pixel.x <- target.pixels[i, 'X']
      pixel.y <- target.pixels[i, 'Y']
      nns <- which(abs(target.pixels[, 'X'] - pixel.x) < resolution & abs(target.pixels[, 'Y'] - pixel.y) < resolution)
      
      roi.pixels[nns, c('X', 'Y')] <- target.pixels[nns, c('X', 'Y')]
      roi.pixels[nns, 'roi'] <- roi
      roi <- roi + 1
    }
  }
  return(roi.pixels)
}
  
roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

To evaluate our discovery of ROIs we plot an ellipse around each ROI in the original image.

## approximate each roi as an ellipse.  need x, y, a, b

plotROI <- function(roi.pixels){
  require(plotrix)
  
  for(roi in unique(roi.pixels$roi)){
    temp.pixels <- roi.pixels[which(roi.pixels$roi == roi),]
    temp.a <- max(temp.pixels$X) - min(temp.pixels$X)
    temp.b <- max(temp.pixels$Y) - min(temp.pixels$Y)
    temp.x <- mean(temp.pixels$X)
    temp.y <- mean(temp.pixels$Y)
    
    plot.y <- temp.y
    draw.ellipse(temp.x, plot.y, temp.a, temp.b, border = 'red')
  }
}

plotImage(scope.xy, scope.rgb)
plotROI(roi.pixels)

It certainly isn't perfect, the two chained diatoms in particular throw off our count. We did, however, do a reasonable job of finding all the small ROIs that represent the smaller, harder to count cells. So how does the model perform for ROI identification on a new image? Here's a new image acquired with the same exposure settings on the same scope. We use the same Python code to convert it to RGB and XY matrices.

## convert image to two matrices: a 3 column RGB matrix and
## 2 column xy matrix

import matplotlib.image as mpimg

name = '14000740.png'
name = name.rstrip('.png')

img = mpimg.imread(name + '.png')

with open(name + '.rgb4r.csv', 'w') as rgb_out, open(name + '.xy.csv', 'w') as xy_out:
    for i in range(0, img.shape[1]):
        for j in range(0, img.shape[0]):
            print(img[j, i, 0], img[j, i, 1], img[j, i, 2], sep = ',', file = rgb_out)
            print(i + 1, j + 1, sep = ',', file = xy_out)

Then we predict and plot.

scope.rgb <- read.csv('14000740.rgb4r.csv', header = F)
scope.xy <- read.csv('14000740.xy.csv', header = F)

colnames(scope.rgb) <- c('red', 'green', 'blue')
colnames(scope.xy) <- c('X', 'Y')

plotImage(scope.xy, scope.rgb)
image.predict <- predict(som.model, newdata = list('rgb' = data.matrix(scope.rgb)), whatmap = 'rgb') # predict for rgb only

target.units = which(som.codes.k$cluster %in% c(3,6))
target.pixels <- scope.xy[which(image.predict$unit.classif %in% target.units), c('X', 'Y')]

roi.pixels <- findROI()
roi.table <- table(roi.pixels$roi)

plotImage(scope.xy, scope.rgb)
plotROI(roi.pixels)

Not bad! Again, it isn't perfect, some ROIs are grouped together and some are missed (largely a function of the variable focal plane). These can be fixed by experimenting with the model parameters and resolution. We did accomplish the goal of improving objectivity and reproducibility; our approach isn't always right, but at least it's wrong in a consistent way! Of course an additional advantage is if you had hundreds of such images, perhaps representing multiple randomly selected fields from many images, you could process in a few minutes what would take many hours to count.

At this point I can feel the collective judgement of every environmental microbiologist since van Leeuwenhoek for promoting a method that might reduce the serendipitous discovery that comes with spending hours staring through a microscope. So here's a reminder to spend time getting familiar with your samples under the microscope, regardless of how you identify ROIs!

Posted in Computer tutorials | Leave a comment

Tutorial: Self Organizing Maps in R

Self-organizing maps (SOMs) are a form of neural network and a wonderful way to partition complex data.  In our lab they’re a routine part of our flow cytometry and sequence analysis workflows, but we use them for all kinds of environmental data (like this).  All of the mainstream data analysis languages (R, Python, Matlab) have packages for training and working with SOMs.  My favorite is the R package Kohonen, which is simple to use but can support some fairly complex analysis through SOMs with multiple data layers and supervised learning (superSOMs).  The Kohonen package has a nice, very accessible paper that describe its key features, and some excellent examples.  This tutorial applies our basic workflow for a single-layer SOM to RGB color data.  RGB color space segmentation is a popular way to evaluate machine learning algorithms, as it is intrinsically multi-variate and inherently meaningful.  Get like colors grouping together and you know that you’ve set things up correctly!

This application of SOMs has two steps.  Each of these steps can be thought of as an independent data reduction step.  It’s important to remember that you’re not reducing dimensions per se, as you would in a PCA, you’re aggregating like data so that you can describe them as convenient units (instead of n individual observations).  The final outcome, however, represents a reduction in dimensionality to a single parameter for all observations (e.g., the color blue instead of (0, 0, 255) in RGB colorspace).  The first step – training the SOM – assigns your observations to map units.  The second step – clustering the map units into classes – finds the structure underlying the values associated with the map units after training.  At the end of this procedure each observation belongs to a map unit, and each map unit belongs to a class.  Thus each observation inherits the class of its associated map unit.  If that’s not clear don’t sweat it.  It will become clear as you go through the procedure.

First, let’s generate some random RGB data.  This takes the form of a three column matrix where each row is a pixel (i.e. an observation).

#### generate some RGB data ####

## select the number of random RGB vectors for training data

sample.size <- 10000

## generate dataframe of random RGB vectors

sample.rgb <- as.data.frame(matrix(nrow = sample.size, ncol = 3))
colnames(sample.rgb) <- c('R', 'G', 'B')

sample.rgb$R <- sample(0:255, sample.size, replace = T)
sample.rgb$G <- sample(0:255, sample.size, replace = T)
sample.rgb$B <- sample(0:255, sample.size, replace = T)

Next, we define a map space for the SOM and train the model.  Picking the right grid size for the map space is non-trivial; you want about 5 elements from the training data per map unit, though you’ll likely find that they’re not uniformly distributed.  It’s best to use a symmetrical map unless you have a very small training dataset, hexagonal map units, and a toroidal shape.  The latter is important to avoid edge effects (a toroid has no edges).

One important caveat for the RGB data is that we’re not going to bother with any scaling or normalizing.  The parameters are all on the same scale and evenly distributed between 0 and the max value of 255.  Likely your data are not so nicely formed! 

#### train the SOM ####

## define a grid for the SOM and train

library(kohonen)

grid.size <- ceiling(sample.size ^ (1/2.5))
som.grid <- somgrid(xdim = grid.size, ydim = grid.size, topo = 'hexagonal', toroidal = T)
som.model <- som(data.matrix(sample.rgb), grid = som.grid)

One you’ve trained the SOM it’s a good idea to explore the output of the `som` function to get a feel for the different items in there.  The output takes the form of nested lists.  Here we extract a couple of items that we’ll need later, and also create a distance matrix of the map units.  We can do this because the fundamental purpose of map units is to have a codebook vector that mimics the structure of the training data.  During training each codebook vector is iteratively updated along with its neighbors to match the training data.  After sufficient iterations the codebook vectors reflect the underlying structure of the data.

## extract some data to make it easier to use

som.events <- som.model$codes[[1]]
som.events.colors <- rgb(som.events[,1], som.events[,2], som.events[,3], maxColorValue = 255)
som.dist <- as.matrix(dist(som.events))

Now that we have a trained SOM let’s generate a descriptive plot.  Since the data are RGB colors, if we color the plot accordingly it should be sensible.  For comparison, we first create a plot with randomized codebook vectors.  This represents the SOM at the start of training.

## generate a plot of the untrained data.  this isn't really the configuration at first iteration, but
## serves as an example

plot(som.model,
     type = 'mapping',
     bg = som.events.colors[sample.int(length(som.events.colors), size = length(som.events.colors))],
     keepMargins = F,
     col = NA,
     main = '')

And now the trained SOM:

## generate a plot after training.

plot(som.model,
     type = 'mapping',
     bg = som.events.colors,
     keepMargins = F,
     col = NA,
     main = '')

So pretty! The next step is to cluster the map units into classes.  As with all clustering analysis, a key question is how many clusters (k) should we define?  One way to inform our decision is to evaluate the distance between all items assigned to each cluster for many different values of k.  Ideally, creating a scree plot of mean within-cluster distance vs. k will yield an inflection point that suggests a meaningful value of k.  In practice this inflection point is extremely sensitive to the size of the underlying data (in this case, the number of map units), however, it can be a useful starting place.  Consider that the RGB data were defined continuously, meaning that there is no underlying structure!  Nonetheless we still get an inflection point.

#### look for a reasonable number of clusters ####

## Evaluate within cluster distances for different values of k.  This is
## more dependent on the number of map units in the SOM than the structure
## of the underlying data, but until we have a better way...

## Define a function to calculate mean distance within each cluster.  This
## is roughly analogous to the within clusters ss approach

clusterMeanDist <- function(clusters){
  cluster.means = c()
  
  for(c in unique(clusters)){
    temp.members <- which(clusters == c)
    
    if(length(temp.members) > 1){
      temp.dist <- som.dist[temp.members,]
      temp.dist <- temp.dist[,temp.members]
      cluster.means <- append(cluster.means, mean(temp.dist))
    }else(cluster.means <- 0)
  }
  
  return(mean(cluster.means))
  
}

try.k <- 2:100
cluster.dist.eval <- as.data.frame(matrix(ncol = 3, nrow = (length(try.k))))
colnames(cluster.dist.eval) <- c('k', 'kmeans', 'hclust')

for(i in 1:length(try.k)) {
  cluster.dist.eval[i, 'k'] <- try.k[i]
  cluster.dist.eval[i, 'kmeans'] <- clusterMeanDist(kmeans(som.events, centers = try.k[i], iter.max = 20)$cluster)
  cluster.dist.eval[i, 'hclust'] <- clusterMeanDist(cutree(hclust(vegdist(som.events)), k = try.k[i]))
}

plot(cluster.dist.eval[, 'kmeans'] ~ try.k,
     type = 'l')

lines(cluster.dist.eval[, 'hclust'] ~ try.k,
      col = 'red')

legend('topright',
       legend = c('k-means', 'hierarchical'),
       col = c('black', 'red'),
       lty = c(1, 1))

Having picked a reasonable value for k (let’s say k = 20) we can evaluate different clustering algorithms.  For our data k-means almost always performs best, but you should choose what works best for your data.  Here will evaluate k-means, hierarchical clustering, and model-based clustering.  What we’re looking for in the plots is a clustering method that produces contiguous classes.  If classes are spread all across the map, then the clustering algorithm isn’t capturing the structure of the SOM well.

#### evaluate clustering algorithms ####

## Having selected a reasonable value for k, evaluate different clustering algorithms.

library(pmclust)

## Define a function for make a simple plot of clustering output.
## This is the same as previousl plotting, but we define the function
## here as we wanted to play with the color earlier.

plotSOM <- function(clusters){
  plot(som.model,
       type = 'mapping',
       bg = som.events.colors,
       keepMargins = F,
       col = NA)
  
  add.cluster.boundaries(som.model, clusters)
}

## Try several different clustering algorithms, and, if desired, different values for k

cluster.tries <- list()

for(k in c(20)){
  
  ## model based clustering using pmclust
  
  som.cluster.pm.em <- pmclust(som.events, K = k, algorithm = 'em')$class # model based
  som.cluster.pm.aecm <- pmclust(som.events, K = k, algorithm = 'aecm')$class # model based
  som.cluster.pm.apecm <- pmclust(som.events, K = k, algorithm = 'apecm')$class # model based
  som.cluster.pm.apecma <- pmclust(som.events, K = k, algorithm = 'apecma')$class # model based
  som.cluster.pm.kmeans <- pmclust(som.events, K = k, algorithm = 'kmeans')$class # model based
  
  ## k-means clustering
  
  som.cluster.k <- kmeans(som.events, centers = k, iter.max = 100, nstart = 10)$cluster # k-means
  
  ## hierarchical clustering
  
  som.dist <- dist(som.events) # hierarchical, step 1
  som.cluster.h <- cutree(hclust(som.dist), k = k) # hierarchical, step 2
  
  ## capture outputs
  
  cluster.tries[[paste0('som.cluster.pm.em.', k)]] <- som.cluster.pm.em
  cluster.tries[[paste0('som.cluster.pm.aecm.', k)]] <- som.cluster.pm.aecm
  cluster.tries[[paste0('som.cluster.pm.apecm.', k)]] <- som.cluster.pm.apecm
  cluster.tries[[paste0('som.cluster.pm.apecma.', k)]] <- som.cluster.pm.apecma
  cluster.tries[[paste0('som.cluster.pm.kmeans.', k)]] <- som.cluster.pm.kmeans
  cluster.tries[[paste0('som.cluster.k.', k)]] <- som.cluster.k
  cluster.tries[[paste0('som.cluster.h.', k)]] <- som.cluster.h
}

## Take a look at the various clusters.  You're looking for the algorithm that produces the
## least fragmented clusters.

plotSOM(cluster.tries$som.cluster.pm.em.20)
plotSOM(cluster.tries$som.cluster.pm.aecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecm.20)
plotSOM(cluster.tries$som.cluster.pm.apecma.20)
plotSOM(cluster.tries$som.cluster.k.20)
plotSOM(cluster.tries$som.cluster.h.20)

For brevity I’m not showing the plots produced for all the different clustering algorithms. For these data the k-means and hierarchical clustering algorithms both look pretty good, I have a slight preference for the k-means version:

The SOM and final map unit clustering represent a classification model that can be saved for use with later data.  Once huge advantage to using SOMs over other analysis methods (e.g., ordination techniques) is their usefulness for organizing newly collected data.  New data, if necessary scared and normalized in the same way as the training data, can be classified by finding the map unit with the minimum distance to the new observation.  To demonstrate this, we’ll generate and classify a small new RGB dataset (in reality classifying in this way is very efficient, and could accommodate a huge number of new observations).  First, we save the SOM and final clustering.

## The SOM and map unit clustering represent a classification model.  These can be saved for
## later use.

som.cluster <- som.cluster.k
som.notes <- c('Clustering based on k-means')

save(file = 'som_model_demo.Rdata', list = c('som.cluster', 'som.notes', 'som.model', 'som.events.colors'))

Then we generate new RGB data, classify it, and make a plot to compare the original data, the color of the winning map unit, and the color of the cluster that map unit belongs to.

#### classification ####

## make a new dataset to classify ##

new.data.size <- 20
new.data <- as.data.frame(matrix(nrow = new.data.size, ncol = 3))
colnames(new.data) <- c('R', 'G', 'B')

new.data$R <- sample(0:255, new.data.size, replace = T)
new.data$G <- sample(0:255, new.data.size, replace = T)
new.data$B <- sample(0:255, new.data.size, replace = T)

## get the closest map unit to each point

new.data.units <- map(som.model, newdata = data.matrix(new.data))

## get the classification for closest map units

new.data.classes <- som.cluster[new.data.units$unit.classif]

## compare colors of the new data, unit, and class, first define a function
## to calculate the mean colors for each cluster

clusterMeanColor <- function(clusters){
  cluster.means = c()
  som.codes <- som.model$codes[[1]]
  
  for(c in sort(unique(clusters))){
    temp.members <- which(clusters == c)
    
    if(length(temp.members) > 1){
      temp.codes <- som.codes[temp.members,]
      temp.means <- colMeans(temp.codes)
      temp.col <- rgb(temp.means[1], temp.means[2], temp.means[3], maxColorValue = 255)
      cluster.means <- append(cluster.means, temp.col)
    }else({
      temp.codes <- som.codes[temp.members,]
      temp.col <- rgb(temp.codes[1], temp.codes[2], temp.codes[3], maxColorValue = 255)
      cluster.means <- append(cluster.means, temp.col)
      })
  }
  
  return(cluster.means)
  
}

class.colors <- clusterMeanColor(som.cluster)

plot(1:length(new.data$R), rep(1, length(new.data$R)),
     col = rgb(new.data$R, new.data$G, new.data$B, maxColorValue = 255),
     ylim = c(0,4),
     pch = 19,
     cex = 3,
     xlab = 'New data',
     yaxt = 'n',
     ylab = 'Level')

axis(2, at = c(1, 2, 3), labels = c('New data', 'Unit', 'Class'))

points(1:length(new.data.units$unit.classif), rep(2, length(new.data.units$unit.classif)),
     col = som.events.colors[new.data.units$unit.classif],
     pch = 19,
     cex = 3)

points(1:length(new.data.classes), rep(3, length(new.data.classes)),
       col = class.colors[new.data.classes],
       pch = 19,
       cex = 3)

Looks pretty good! Despite defining only 20 classes, class seems to be a reasonable representation of the original data. Only slight differences in color can be observed between the data, winning map unit, and class.

Posted in Computer tutorials | 19 Comments

MOSAiC Leg 3

Where to begin… I started writing this post several days ago on a nearly empty plane flying from Charlotte, N.C. to San Diego. That flight marked the end of MOSAiC Leg 3 for me, though Leg 3 will continue for several more weeks. We were supposed to be done on April 4, however, the COVID-19 pandemic and dynamic sea ice conditions pretty well hashed that plan. I took advantage of a single opportunity to leave the Polarstern early (meaning only a couple of weeks late) to help with the childcare situation at home. The remaining brave and dedicated crew and scientists are expected to return by ship to Europe in late May.

Tromsø, Norway. High on my list of nice places.

Our journey began in Tromsø, Norway in the innocent days of January when COVID-19 seemed like a regional rather than global problem. We had several Chinese expedition members on Leg 3, but they all made it out just fine and – though they were concerned about the situation back home – we figured things would improve in due time. After a week of safety training we were transported to the Russian icebreaker Kapitan Dranitsyn for what we thought would be a 2-3 week voyage to the Polarstern.

Leg 3 aboard the Kapitan Dranitsyn in Tromsø on January 28.

A lot of thought and effort went into determining how the MOSAiC scientists and crew would reach Polarstern throughout the drift. During the winter a conventional icebreaker of Dranitsyn‘s class is not the best way to reach a location in the central Arctic, however, it was the best among the affordable and available options. And in the end it did the job. I’m not sure of the history, but I’m fairly certain that its feat of reaching Polarstern in the dead of winter is unprecedented.

The Dranitsyn plows through “mild” winter weather in the Barents Sea on February 5.

We spent a week in a pleasant fjord just outside of Tromsø waiting for a weather break to cross the Barents Sea to the pack ice. The Barents Sea is notoriously stormy, and we needed wave heights below 4 m to attempt the crossing. Ice breakers don’t ride well in heavy seas, and there were refrigerated shipping containers carrying (our) food for Polarstern stored on deck in the bow. We couldn’t take big waves on the bow, nor could we tolerate much ice formation on the chilling units. Both happened anyway. But at least the crossing was relatively short, and within 72 hours we were in the pack ice.

Dranitsyn on February 6, shortly after entering the pack ice.

The Dranitsyn ended up being a bigger part of Leg 3 than any of us imagined at the time. Had our departure from Polarstern gone as expected, we would have been in transit on Dranitsyn as long as we were doing science on Polarstern. It was an interesting experience. Scientists are pretty good at keeping themselves busy; most of us have a long backlog of analyses to complete and papers to write, in addition to fielding emails from students and colleagues, and handling administrative matters. Absent the internet or email, however, a lot of these responsibilities disappeared. I worked on a proposal and finished a (3 year overdue) paper. Much of the rest of the 5 weeks we were on Dranitsyn I spent in discussion with my Leg 3 colleagues. It was something like an extended MOSAiC workshop. We covered everything from synthesizing across different science themes, to how time and on-ice resources would be shared between groups in a typical week.

The Dranitsyn on February 23, deep in the Arctic ice.

The roughly 2 week delay in reaching the Polarstern was due in large part to ice conditions. Despite previously expeditions onboard icebreakers I hadn’t really thought out it this way before, but the ice compresses and relaxes in response to tides, winds, and other external forces. When the ice is compressed it’s incredibly difficult for an ice breaker to break; there’s simply no place for the ice to be displaced to. Under a relaxed state more leads are open, and there’s more space within the pack ice accommodate the displaced ice as the ship moves through. Temperature also plays a role. Icebreakers are more typically used during the spring summer and fall, when maritime shipping in the Arctic is active and more research activities are taking place. Spring, summer, and fall sea ice is naturally much warmer than winter sea ice. Warmer sea ice is softer and breaks more easily, and the surface of the ice has less friction. This means an icebreaker can more easily ride up and over the ice to break it. With temperatures low and the ice in a “compressed” state it was a tough grind, as this video from our embedded videographers from the UFA production company shows:

Despite the conditions and some weeks of uncertainty we did eventually make it to Polarstern. Seeing that little point of light appear on the horizon after all those weeks of travel made me appreciate how alone we were up there at that time of year.

Dranitsyn approaching Polarstern on February 28. It was another week before we finally said good-bye to the Dranitsyn and settled into our new home in the Arctic.

After we reached Polarstern it took nearly a week to transfer cargo, exchange the scientists and crew, and become familiar enough with our tasks and the surroundings to take over. Many of the MOSAiC observations take place in what’s called the central observatory. This is an aggregation of relatively stable floes around Polarstern that house a number of on-ice installations. These include the colloquially named Met City, Balloon Town, Ocean City, and Droneville sites, among others. Beyond the central observatory are various “pristine” sites for snow and ice sampling, and beyond those lie the nodes of the less frequently visited distributed observatory. The distributed observatory is critical because it provides some spatial context to the intensive observations of the central observatory.

Preparing for a CTD cast on March 6, shortly after the departure of Dranitsyn. In between CTD casts the hole is covered by the Weatherhaven, which has to be picked up and moved out of the way for each deployment. Note the location of the gangway relative to the CTD hole (it’s going to change!). Immediately behind the CTD hole is the logistics zone for staging equipment. In the background, just behind the green flag, you can make out the Ocean City and Balloon Town sites.

We had about a week of “normal” operations before the central Arctic started throwing plot twists at us. The ice was surprisingly dynamic. The reasons behind this will, I think, be an important science outcome of MOSAiC. Thinner, rougher sea ice? More wind stress? Whatever the cause the ice cracked a lot. By chance we were located in what ice dynamicists call a “shear zone”, an area of enhanced kinetic energy within the pack. Here’s the first emergence of a crack, on March 11. You can see the logistics team scrambling to move the snow machines to a safer location. Over the next few weeks this crack grew into a major and ever-evolving lead.

Leg 3’s first encounter with a major crack in the central observatory, on March 11.

By April everyone was pretty used to cracks, leads, and ridges in the central observatory. Overall I was impressed with the resilience of the various team as different installations were threatened, and in some cases destroyed. For the ATMOS (atmospheric sciences) team in particularly, Leg 3 should have been relatively easy – low temps not withstanding. Everyone expected consolidated ice and stable, well-established instruments and protocols. Instead, for a period of time there was a near-daily scramble to maintain power and infrastructure at the Met City site. The adjacent Remote Sensing site was enveloped by a large ridge system and had to be relocated to near the logistics area. Setting up these sites is something that took specialists on Leg 1 many days, on Leg 3 systems had to be dismantled and re-established on the fly. Because spring is a particularly interesting time for atmospheric chemistry in the Arctic the clock was ticking every time a site or instrument went down. The dedication and ingenuity of the scientists at the Met City and the Remote Sensing sites was great to observe. The rest of us helped where we could, but we had our hands pretty full with other problems.

Polarstern crew wrangle power cables that have become trapped between the ship and the floe on March 23. Maintaining the power supply from the ship to the various on-ice installations was a huge challenge given the dynamic ice conditions.

At the top of the problems list was the loss of the hole for the main CTD/rosette system. The CTD is an instrument package that measure conductivity, temperature, and pressure (depth) along with other parameters. It is the fundamental tool in ship-based oceanography. The CTD is attached to a long conductive wire, and embedded within a rosette of sample bottles. The sample bottles are fired at specific depths to collect water for a number of analyses and experiments. Lots of parameters and projects were dependent on this sampling system, and a huge amount of effort had been expended to construct and maintain a hole in the ice for deploying it. On March 15, however, the ice shifted, pushing Polarstern forward. This caused superficial damage to the Weatherhaven covering the hole, but more significantly placed the hole out of reach of the crane that operates the CTD/rosette. Just like that we lost all of our capacity to sample below 1000 m (the central Arctic Ocean is deeper than 4000 m in most places) and to collect large volumes of water from any depth. All sampling had to shift to a much smaller system at the Ocean City site.

On March 15 shifting ice moves the Polarstern forward, rendering the main CTD hole useless.

Why couldn’t we simply make a new hole? It’s worth remembering that the ice in March is near its maximum thickness. It was roughly 160 cm thick when access to the main CTD hole was lost. This discounts any rafting of multiple ice floes, which was probable, and could easily double or triple the thickness. Assuming only a single layer of ice, the way to make a hole big enough for the CTD is with a chainsaw. The thickness of the ice that you can cut is limited by the size of the chainsaw bar. Maybe commercial logging operations have a 2 m bar, we certainly did not! You can cut the ice out in layers – I’ve had to do this in the Antarctic before – but the problem is that you create a bathtub as soon as you start cutting the final layer. To finish the job you’d need a snorkel for yourself and the chainsaw!

Sampling at Ocean City on March 10.

All our water column sampling had to shift to Ocean City, and focus on the upper 1000 m of the water column. Ocean City is the main site for the physical oceanography team. It was designed to accommodate a small team taking ultra-high resolution measurements of the surface ocean. The physical oceanographers went above and beyond sharing their space and resources, and I ended up thoroughly enjoying the time that I spent out at Ocean City. The below video was made on April 20 by lowering a GoPro through the CTD hole at Ocean City while one of the physical oceanographers is conducting high resolution profiling of temperature and salinity. You can see the microstructure profiler used for this near the end of the sequence.

In addition to the water column sampling we carried out sea ice sampling, when conditions allowed. To minimize the impact of light pollution from the vessel on the growth of sea ice algae our preferred ice coring sites were located some distance from the ship. Through the spring and summer, most of the photosynthesis taking place in the central Arctic occurs in the ice itself, rather than the water column. The ice algae have more consistent access to light than their planktonic counterparts, and are famously sensitive to even the lowest levels of light. Ambient light from the ship is more than enough to induce growth in the vicinity during the long polar night. Distance from the ship combined with the dynamic ice conditions created some access challenges.

Delicate maneuvering with a full load of ice cores on April 6. Photo: Eric Brossier.

Despite the access challenges we got some great ice core samples. We fielded two ice coring teams, one for first year ice and one for second year ice. I had the pleasure of working with the second year ice coring team. It was a great US-German-Russian collaborative effort, and we had some good times out there!

Laura Wischnewski and I section sea ice cores. Photo: Eric Brossier.
The combined Leg 3 first year and second year ice coring teams.

The original plan for exchanging Leg 3 with Leg 4 involved flying us all out on ski equipped Antonov An-74 aircraft. This would have been a slick and expedient way to carry out the exchange. It also requires a pretty long runway and permissive global travel. By mid-March it was clear that both of these things were going to be an issue. I’ll be honest, there were some tense weeks where it wasn’t clear when and how Leg 3 would end, and what the future of MOSAiC would be. Kudos to cruise and expedition leadership for navigating us through the ups and downs. In particular AWI logistics had the difficult task of designing and scoping the possible solutions. They did an amazing job of iteratively working through a huge range of options to come up with the one that maximized science and minimized impacts on individual lives. But of course it was a compromise.

The current plan involves the Polarstern leaving the central observatory in mid-May for a rendezvous with one or more ships near Svalbard. The Leg 4 personnel (including Bowman Lab member Emelia Chamberlain) are already under strict quarantine in Bremerhaven, Germany. They’ll remain under quarantine until they depart for Svalbard at roughly the same time Polarstern leaves the central observatory. Once the crew has been exchanged, Leg 3 will sail for Germany and Leg 4 will begin the difficult task of re-establishing observations at the central observatory. An advantage of this plan is that it doesn’t require a complete breakdown of the central observatory. It will require, however, that many of the installations be partially disassembled for safety while Polarstern is away from the floe.

This is how you get a Twin Otter to the central Arctic.

There was one opportunity to leave Polarstern before the official Leg 3-4 exchange. After agonizing over it for a couple of days I decided I needed to take advantage of the opportunity. After an epic few weeks our project was in decent shape, and with two young kids and no school or daycare, attention needed to shift to the home front. On April 22 I stepped onto a Twin Otter operated by Kenn Borek Air Ltd. to begin the long flight home with six other expedition members. We flew to Station Nord in Greenland, then across the Canadian Arctic via Eureka-Resolute-Arctic Bay-Churchill-Toronto, and finally to the US.

Immigration: “You’re coming from where?”

Me: “Resolute”

Immigration: “What were you doing in Resolute”

Me: “Just passing through, we were only there for a few minutes”

Immigration: “So where were you before that?”

Me: “Greenland, but again not very long. See there’s this ship…”

Immigration: “Uh, nevermind. Here’s your passport.”

Spectacular view of the northern Greenland coastline on the approach to Station Nord. Note the obvious interface between the landfast sea ice and the drifting pack ice. This feature is part of the circumpolar flaw-lead system and extended as far as we could see in either direction.
Posted in MOSAiC | 1 Comment

CUREing microbes ASBMB2020 presentation

In a sign of the times here’s Ana Barral’s virtual presentation for the 2020 American Society for Biochemistry and Molecular Biology meeting (held remotely). It’s a nice summary of the initial results of our course-based undergraduate research experience (CURE) on microbes on ocean plastics.

Posted in Uncategorized | Leave a comment

“Frozen in the Ice: Exploring the Arctic” – a MOSAiC MOOC

Check out the lectures posted by our team as part of the MOSAiC affiliated Massive Open Online Course. This course, available via both Coursera and Youtube was produced by the University of Colorado Boulder in partnership with the Alfred Wegener Institute and with funding from the National Science Foundation. It explores current research surrounding the Arctic ocean-ice-atmosphere system, as well as the questions driving the MOSAiC International Arctic Drift Expedition.

CIRES/AWI

Module 5 Lectures 3-4: Learn from Jeff about the microbial communities living in sea ice.

Module 5 Lecture 7: Learn from Dr. Jesse Creamean, (Leg 1 MOSAiC participant), about atmospheric aerosols in the Arctic.

Module 5 Lecture 8: Learn from Dr. Brice Loose, (Co-PI of our NSF project), about why it is important to study the Arctic in the first place.

Check it out!! –> Coursera & Youtube

Posted in Uncategorized | Leave a comment

Five lessons from my first quarter of graduate school

In the weeks leading up to starting at Scripps Institution of Oceanography, where I am now a first year PhD student, I often found myself – for better or worse – turning to Google for advice on navigating the next five years. I would google any anxiety-fueled question that popped into my brain, from “What is the hardest year of a PhD?” to “How competitive is a PhD cohort?” I found the best answers online were from older students, who used the space to reflect on their recent experiences and offer up what they had learned along the ride. In that spirit, I hope some recently accepted graduate student stumbles onto this as they furiously google, and that it offers comfort and (hopefully) wisdom for a fun and tumultuous transition.

Below is a list of five things I really learned – and relearned – as a first quarter grad student. I am only one quarter done with a potentially six-year degree (which if we do some quick GRE math, means I am one quarter of one sixth done, which is a little over 4 percent), so this is by no means an exhaustive list. I’m actually really interested to see which of the five becomes more important as the years go on, and please comment if as a graduate student you think I missed something important.

Five lessons from the first quarter of graduate school:

Find help.  This I think was the most often repeated piece of advice I saw going into my first year, and it really holds up. Finding people who I could be myself around – to whom I could ask stupid questions and lament about failed experiments with— was essential to my first quarter survival. My best memories from a packed schedule of lab and class are from study groups and group lab coding sessions. I also got involved on campus once I felt settled, which was a great way to meet and work alongside older students and mentors.

Planning ahead saves money. I worked in a lab as an undergrad and as a tech once I graduated, but I was never in charge of purchasing lab equipment or supplies. This quarter marked the transition for me between being an ignorant consumer of lab supplies to a conscientious one, now that I know exactly how expensive it all is and how much money you can waste by doing a poorly designed experiment (extra hint: include controls!). More generally, the application process for PhDs includes applying for grants, which is just the beginning in learning to apply for and manage money as a researcher. I’ve realized I have a lot to learn about budgeting and management in my journey to become a successful scientist.

Grades aren’t everything anymore.  It was a hard habit to break, but all of your time shouldn’t be spent studying for your general first year classes. I learned to diversify how I obtained knowledge. Reading scientific papers and attending seminars from visiting professors were places where I learned the most this quarter. An afternoon spent reading a paper closely related to your research or an hour attending an interesting seminar often meant more to me than studying for a midterm in my more general oceanography classes.

Say yes. I am writing this from a ship off of the coast of Antarctica, where I am conducting field work, all because my advisor asked if I wanted to go and I said yes. Saying yes to collecting samples in the field is one example, but even to something simpler but still scary – like a surfing class or going to a social event where you don’t know anyone – just say yes.  

You will make mistakes. I think the biggest lesson I learned from the first few months of grad school was how often you make mistakes. It is a daily (sometimes hourly) part of life, in both lab and in class. I am still working on how to learn from and move past mistakes, both large and small.

I think since I am still in my first year I have yet to really experience burnout or writer’s block, which I know happen often to older PhD students. I feel so fortunate to be able to study and do science as a Bowman lab member at Scripps, and I hope my insight from a great first quarter help put any prospective students that are reading this at ease.

Posted in Uncategorized | 1 Comment

Photos from MOSAiC Leg 1

Many thank to atmospheric chemist extraordinaire Jessie Creamean for participating in our NSF project on MOSAiC Leg 1. Jessie’s participation allowed us to have a physical presence during the critical setup phase and freeze in. Her participation was a double win; she has her own DOE funded project with MOSAiC that didn’t include ship time, while our project needed a capable hand for Leg 1. I’ll be picking up where she left off when Leg 3 starts in just a couple of weeks. Jessie shared a few photos from Leg 1.

Jessie Creamean with Akademic Federov in the background.
Checking out a crack in the ice. The ice has been much more dynamic than expected, creating some problems for installing the various observational instruments.
A beautiful view of Polarstern at the onset of the polar night.
Frost flowers! Still a special place in my heart after all these years…
The Russian icebreaker Kapitan Dranitsyn, which will be ferrying the Leg 2 and Leg 3 personnel to Polarstern.
Posted in MOSAiC | Leave a comment

CURE-ing Microbes on Ocean Plastics Video

Over the last few months volunteer diver Caitlyn Webster has been putting together a quick outreach video on our CURE-ing Microbes on Ocean Plastics project with National University. In addition to highlighting the project, Caitlyn provides a nice overview of the issue of plastics in the ocean, and some common misconceptions.

Posted in Uncategorized | Leave a comment