Mastering GitHub Deployments with a Simple Screenshot
Written on
Chapter 1: The Importance of Proof of Execution (POE)
Incorporating a straightforward screenshot into your deployment process can save you considerable time—and avoid the embarrassment of failed deployments.
Want to build and possibly deploy a standout project? Discover how with my free project guide.
When I carefully opened a package that arrived unexpectedly one night, I felt a mix of excitement and anxiety. Had it not matched something from our wedding registry, I might have just left it there, feeling uneasy about its mysterious origins. Inside was a compact vacuum cleaner, which worked flawlessly for a mere three weeks before it encountered issues. While trying to resolve the problems with the manufacturer, they required detailed documentation to validate our claims.
In essence, they sought evidence of performance—or in this case, evidence of failure.
In a similar vein, our data engineering team has adopted a practice known as the POE screenshot.
POE stands for Proof Of Execution. The principle is straightforward: whenever code is committed—regardless of the scale of the change—it's essential to attach a screenshot showing that your build is functioning as intended. This is your final opportunity to provide explanations or justifications if things go awry.
Being part of a remote team, we rely heavily on clear communication, often bordering on redundancy. In this framework, a proof of execution screenshot not only reassures your reviewer but also demonstrates that you've diligently tested your modifications and grasped their implications for the production infrastructure.
This procedure was established for good reason; senior engineers recognized that there were too many unmonitored commits and reviews lacking insight into the expected outcomes. Thus, the practice of including a proof of execution screenshot addresses this concern by:
- Serving as a safeguard for both the committer and the reviewer.
- Enhancing the commit history with visual evidence of outputs.
- Providing a benchmark for reviewers to assess how changes will be implemented in production.
If you're relatively new to data engineering, you might wonder what output I’m referring to. Unlike data analysts and scientists who share visualizations or model outputs, our focus is on two main types of outputs:
- Script-oriented
- Data-oriented
Programmatic output often includes metadata tied to a script's intended function, typically in the form of logging messages.
Note: While the logs depicted above originate from GCP, a POE screenshot would capture logs from a testing environment, such as those produced by Docker or a Pyenv.
In data engineering, the primary output of interest is data-oriented. This type of screenshot answers the crucial question from reviewers: How will this appear in production, or how will it affect existing data?
Data outputs might include screenshots from:
- Programmatic data structures like lists, dictionaries, or data frames.
- Validation spreadsheets, such as Google Sheets.
- Test tables in BigQuery.
Even with a comprehensive screenshot providing proof of execution, there's still a chance your reviewer might not fully understand the changes you're proposing.
In my experience, this often occurs with new builds or during busy periods when reviewers lack context about the adjustments you're aiming to implement.
To mitigate this, I find it beneficial to include a context comment alongside the proof of execution screenshot.
Similar to the screenshot, the context comment doesn’t need to be a literary masterpiece. It simply needs to inform your reviewer of the change, the requestor, and the potential impact on existing processes.
For example, let’s imagine I’m using Agile methodologies outside of work (though I don’t actually do so). This commit introduces a new cloud function that retrieves historical payment data in JSON format from Google Cloud Storage (GCS).
This request is specified in a ticket (if applicable) by a stakeholder. The modification will add a new table:
historic_payment_data
This table will be integrated with a broader engagement view to analyze earnings per article.
Upon comparing this with the source file, there is a 1:1 match with the values in the BigQuery table, showing no discrepancies.
If I were reviewing this example, I might think it excessive. Couldn’t one simply link to the ticket and table? While that's certainly possible, I’ve learned that articulating an explanation in this manner not only assists the reviewer but also serves as a personal log of my changes, reinforcing the necessity (or lack thereof) of the request.
If you're curious about the output of that build, I wrote about it in detail here:
Including a QA metric, such as variations in row counts or values between the source file and test table, boosts my confidence that once merged, the change will function correctly and not disrupt existing processes.
During my initial year as a data engineer, I spent a lot of time rectifying failed commits. Instead of crafting syntactically correct, resilient, and accurate pipelines, I often found myself reacting to failed deployments or unintended behaviors.
I've since realized that an experienced engineer anticipates potential failure points and conducts thorough testing to ensure that a testing environment replicates the conditions present in production.
Ultimately, that's what this screenshot conveys—I completed the necessary work.
And hopefully, we won’t need to revisit this process again.