A year in CanDIG project

Last month, I completed a year in my current project - CanDIG. This project is about building a platform to connect "data lakes" within Canadian institutions in a federated manner. I was hired to mostly do federated authorization, which you can read about in this blog post on CanDIG's blog.

Those who do not know, I have spent the past few years in the Genomics area but mostly on the side of software infrastructure. Some of the Biology and Genomics knowledge that I have is courtesy Donna Maglott at NCBI.

However, I left United States for Canada, once I found a project that I was excited about, and of course, never again have to worry about H1B Visa or the immigration system in US. Having worked a year in the team, I have learned a few things in general that I thought are worth sharing.

Value matching with the team and project

Even though I have mostly worked solo due to the nature of AAI work, it is a tremendous help to work in a team that matches your values. Collaboration with other colleagues on projects that overlap with my goals makes me happy. For example, on the "Dynamic Consents" project with an esteemed colleague that allows for patients to give their consent, and change the consent (either granularity or revocation) if they want while taking into account aspects like the accessibility of remote communities in Canada. I love the fact that our team is taking into account various social issues that hinder marginalized communities' participation, representation, and ownership of their data.

Love-Hate relationship with software development in science

At the time of this writing, there has been a lot of improvement in scientific computing and software development. There are organizations, like RSE, that are helping move the state of the scientific software to a better place. However, there is a lot that remains to be desired. This is also good news for people like me because that allows us to voice the concerns and keep our job.

I also believe that most of the time, as a scientific software developer you can work on a variety of projects and tasks which keep your tech-life less monotonous. I understand, that for some people, shaving Python yaks is also monotonous but as long as I get leeway to learn and grow, I am happy.

The part that I am most concerned about (still) is licensing. A lot of professors in a lot of places are not happy with Copyleft licenses and I am not sure why. I have written a blog post about this question on Copyleft licenses in scientific software. I still do not have a good answer as to why we should avoid Copyleft licenses when they serve the Science more than other licenses, representing the true spirit of Science.

Don't roll your own crypto

A lot of work within the group was related to AAI and there are a bunch of things that you can do by yourself. However, there are likely tools available that you can leverage without having to write those on your own. The point is that the quote "don't roll your own crypto" is not just for the crypto algorithms or PKI etc. It can be for small things like releasing or signing JWTs. Yes, you read that right - JWTs! Please don't judge, we need JWTs because we use OIDC (more later). It is super easy to slip into "build your own" for smaller things. So far, we've avoided that by using Keycloak, Tyk, and libraries for differential privacy, etc. and hopefully will continue on that trajectory by using products like Hashicorp Vault.

Federated, Distributed and other overloaded terms

I never thought I'd even look at something like OIDC in my career but here I am. I have now achieved two things in the year -

  1. Perform cross-atlantic (Europe+Canada) connection of resources (e.g. genomic services) which allow for both sides to log into each other's instances and be authorized at different levels just by using OIDC. This could be called federated authorization.
  2. CanDIG's first draft of authorization, which is a distributed infrastructure that connects multiple sites within Canada to share genomic data. This authorization can also be called federated authorization.

The European infrastructure is much more centralized so the challenges in tying them with our infrastructure are different than what we are building within CanDIG and Canada. What we are doing within CanDIG is much more difficult because we certainly do not have the leeway that a centralized system has, where an alias of the external user can be created to facilitate communication. The terms are definitely overloaded!

But as you can see, the issue is that we need to use a protocol that -

Much of this work is being done in CINECA project and a lot of input to and from GA4GH. There are very few protocols that come close to what exists in the form of OIDC. Let me be clear, I hate it. However, I do not have much say and I do not have any suggestions either. If you are saying "Build your own" then please see the previous point. It is ridiculously hard to come to consensus in any such scientific gathering and then there is the question of $$$.

And I have looked into https://oauth.xyz/ and no it is not ready yet and I do not even know if it is going to be any better, yet. Complexity breeds issues but some of the problems at-hand require complex solutions. Sure, OIDC is complicated as well.

I appreciate you read so far what seems like a brain dump of last year. Here's to another year with a focus on systems, Rust, privacy, and federation!