IaC Cost Estimator for Commit-Level AWS Cloud Budgeting

Sandeep Yaramchitti
Cloud Native Daily
Published in
6 min readMar 25, 2024

--

In the realm of cloud infrastructure, where every resource translates to an operational cost, here is an interesting project that I built to proactively find AWS resource cost based on the infrastructure defined in Cloudformation or Terraform, it integrates seamlessly into the source code commit and pull request (PR) workflow, ensuring that every proposed infrastructure change is assessed for its financial impact before deployment.

Introduction

In today’s fast-paced tech environment, managing cloud infrastructure costs is as critical as the infrastructure itself. The iac-cost-estimator tool offers a robust solution to forecast and manage cloud expenses by providing precise cost estimations for Infrastructure as Code (IaC) configurations. This blog post delves into the architecture of the iac-cost-estimator, highlighting how its components work together to deliver real-time, accurate cost analysis for AWS cloud resources.

Architecture Overview

The architecture of the iac-cost-estimator comprises several interconnected components that work in harmony:

  1. Version Control System (VCS) Integration: The process begins with the VCS, such as GitHub, where IaC configurations are stored. Whenever there is a new commit or pull request, a webhook is triggered.
  2. Webhook Event: This event acts as a notification that prompts the iac-cost-estimator to begin its analysis. This is achieved through the use of GitHub App.
  3. API Gateway: Serving as the entry point for the iac-cost-estimator, the API Gateway receives the webhook event and forwards it to the appropriate Lambda function.
  4. Lambda Function: This function is responsible to validate the incoming requests to ensure request is initiated from GitHub itself. This is done through validating the webhook signature. This will help to avoid spending server time to process deliveries that are not from GitHub and will help avoid man-in-the-middle attacks.
  5. SQS Queue: Lambda function is responsible to sends messages to the specified SQS queue. The message body contains the push_event_data, which is again serialized into JSON format. This is the actual data payload that will be passed on to the queue and subsequently processed by State Machine.
  6. State Machine: I also have a Lambda function serves as an intermediary between Amazon SQS and AWS Step Functions. It receives messages from an SQS queue, initiates an execution of a Step Functions state machine with the message content, and then deletes the processed message from the queue. The function is designed to handle multiple messages efficiently, invoking a state machine execution for each one and managing exceptions gracefully.
  7. IaC Cost Estimation: State Machine is responsible to orchestrate the entire IaC cost estimation based on the received webhook event data. It uses choice state to determine if the cost estimation is for CloudFormation or Terraform and based on that workflow would be executed.
  8. Amazon Bedrock: For the CloudFormation flow, it uses amazon bedrock to find the AWS resource types using AI models and in this case, uses cohere LLM for this operation. (Watch out for the source code and it is lot of fun to work with the Bedrock with Lambda)
  9. Notification: For notifications, this design includes communicating back to GitHub as comments, PR review comments and in addition, you will get email and AWS Simple Email Service (SES) is used for that.

Inside the iac-cost-estimator: A Technical Deep Dive(Part 1)

The process starts with creating a GitHub App and installing that in the repo; subscribing to the right events and configuring webhook.

Please refer the GitHub’s official documentation about:

Check out the following screenshot on the IaC Cost Estimator GitHub App

Configure the webhook with API Gateway URL and I have it fronted with Route53 URL and will show the detailed code how it is built.

As explained in the architecture, the entire solution relies on a commit to GitHub or a PR event; to set the permissions and subscribe to the right events.

Follow the instructions on the GitHub’s official site for more information on installing it on repos here.

Now, let’s dive into the solution and start with the AWS resources provisioned through the IaC and in this case, I have used SAM Template and this section includes all the resources defined in the architecture starting with Route53 url, API Gateway, Serverless Lambda function, SQS, State Machine, Amazon Bedrock etc.

Once you deploy this template and configure the Webhook section as explained above; we are ready to receive Webhook events on commit or PR :)

Now, let’s look at how to validate the request to ensure it is initiated from GitHub and the request is not tampered. Check out GitHub’s official documentation on this here.

When we configure Github Webhook Secret, github sends a webhook signature header along with the payload as event and this is done by creating a signature hash on github end using the secret and the payload. As the API Gateway receive the event and forward it to lambda function, lambda function is validating the request by computing the hash again using the secret and payload. That’s how the backend resources are protected and only allowed for valid GitHub signature.

Also, refer the following code snippet from GitHub on how to validate the webhook signature in python.

Now, let’s look at the lambda code to extract the push event data and sample to check if cloudformation files have been added / modified / deleted.

Now, if the file changes are associated with CloudFormation or Terraform — we will send that to SQS to run this at scale and also decouple the processing.

Now, this Lambda function takes messages from an SQS queue, uses the contents of each message to start an execution of an AWS Step Functions state machine, and then deletes the messages from the queue once they have been processed.

Now, State Machine has the data it needs for the IaC cost estimation and I am planning to write another blog to explain in details. Please do watch for the part- 2 of this blog :)

However, I would like to spend sometime on integration of Gen AI capabilities to this solution and I have been experimenting with AWS fully managed high performing LLM’s and specifically integrated Cohere LLM for text generation and in this case, I have been able to integrate with the lambda function to basically generate AWS resource types used from the event data.

Let’s see the code in action and start with the IaC.

The following code is a Lambda function designed to process a given event, extract data from URLs contained in the event, and use this data to invoke an LLM model.

We’ve just scratched the surface of the iac-cost-estimator’s capabilities and the technical intricacies that make it such a valuable tool for AWS cloud budgeting. While we’ve explored the architectural framework and initiated our journey into the code, there is still more to uncover. Stay tuned for the next installment of this blog series where Iwill dive deeper into the code walkthrough.

--

--

Sandeep Yaramchitti
Cloud Native Daily

Principal Software Engineer with a strong focus on Fullstack development, DevOps and Test Automation. https://cloudysky.link/