Programming with Credentials like a Pro

Christopher Lagali
5 min readMar 16, 2022
Photo by Jason Dent on Unsplash

Why do it?

Building and maintaining a Data Warehouse can get tricky when your source systems are external to your organization’s network or even geographical location.

An Average Data Engineer, who in the past would normally be responsible for maintaining E.T.L is now expected to extract and coalesce data from various external sources. These could be API’s or just normal cloud hosted databases. Unlike the dodo, we engineers have adapted with tools that would not only extract data from such external micro-services but also ensure reliability with retry mechanisms.

If that wasn’t complicated enough; we now have to deal with the Information Security team who act as a big brother (constantly looking over your shoulder) and who always imagine the worst when API’s are mentioned. They believe that constant password rotation is the best way to keep hackers and troublemakers guessing.

Well! how do you ensure safe credential rotation and also avoid constant E.T.L changes?

Central Credential Stores

Luckily cloud providers have a few tricks up their sleeves which we engineers must leverage to keep up with such requirements.

In this article I will illustrate a few Credential stores that I have worked with along with some code snippets.

Hashicorp Vault: link

Borrowed from official website

The vault project is one such popular credential store that offers a central repository for your keys, secrets and even certificates. It doesn’t just store credentials, but it also can be used to rotate your keys on a schedule for a few services (Hashicorp keeps adding services to their menu every day).

Once set up; one must simply connect with the vault and extract the credentials he/she needs for establishing connection.

Key Rotations can now be done with ease and your E.T.L’s will always pull the recent credentials thus avoiding the painful task of locating and changing the keys within the code.

The Best Part about is Vault that it can be set up as a local service (within a V.M) or on cloud depending on your scale of implementation.

For demonstration purposes I created a vault server that was local to the ec2 using this link and had python script to extract the secrets from it.

Library Used: hvac

Code that connects and extract secrets from local vault server.

Pros :

  • Open Source and easy to use with respect to encrypting and storing secrets.
  • Vault allows you dynamically generate users and passwords for some known databases or other tools like ssh.
  • Vault can also generate PKI certificates.
  • Different ways to interface with vault. It could be via CLI, API or programing library.

Cons :

  • Though vault is powerful it can however get complex.
  • Everything in vault is defined by a path which is not always easy to deal with especially while using C.L.I.
  • Inspecting audit logs could be tedious.

AWS Secret Manager

Borrowed from official website

If you are already operating within the AWS environment and need to avoid setting up your own credentials store then AWS Secret Manager is the best option.

It is a managed service and only needs to be touched when a key needs to be rotated. I.A.M policies also need to be set up to provide your services adequate rights to query the Secrets Manager.

Apart from these 2 steps mentioned above, you should be good to go!

Note: The Script was executed within an ec2 which had the adequate rights to query the secrets manager within the appropriate AWS Account.

Pros :

  • User Friendly UI and easy to maintain.
  • Admin can access and modify keys easily in a matter of mins.
  • Credentials are secure and accessible to only those who are authorized via I.A.M Policies.
  • Authorized Users can be added with I.A.M Policies and thus its easily scalable.

Cons:

  • Setting up access can get tricky when you are dealing with access rights at some granular levels.
  • Some have experienced problems with rate limits/quotas. AWS Limits

V.M’s Environment Variables

Borrowed from official website

If you are a small team and do not have too many keys to rotate or you do not wish to get too much involved into such processes then your V.M’s environment variables is the way to go.

Essentially, one could store these variables by having them as export commands in the user’s .bashrc file. tutorial

Engineers need to only reference the environment variable’s name within their code to access and use the credential. Some organizations have also embedded these export scripts within the V.M’s image thus automating the deployment of these environment variables.

Note: Environment variable need to be added to the V.M’s environment prior to the execution of the above script.

Pros :

  • Easy to maintain.
  • Admin of the V.M can only make changes to these variables.
  • Credentials are secure and accessible to only those who are authorized to access the V.M.

Cons :

  • No versioning.
  • Rotation is a manual process which can induce human error.

--

--

Christopher Lagali

An Enthusiastic Data Eng. who is on a mission to unravel the possibilities of pipeline building with AWS and who believes in knowledge sharing.