Make Data Acquisition Easy with AWS & Lambda (Python) in 12 Steps

Enter Lambda (yes I know you could also use Batch for this, but this is a Lambda tutorial).

AWS has a ton of services, but perhaps my favorite is AWS Lambda.

Basically it lets you focus on writing code and not dealing with annoying things like VPCs, EC2 instances, MySQL databases, etc.

Just write some Python, give that code to Lambda, and it will execute that code in the Cloud.

Even better, you can trigger that code in a variety of ways: every minute, once a day, when you put something into an S3 bucket, etc.

Lambda is totally awesome (and cheap) and has quickly become my most used AWS service.

What We Will Be DoingIn twelve, concise steps we will set up a Lambda function that will automatically pull data from Craigslist every day and store that data (in JSON) in S3.

This is just one example, but you could use this tutorial to run any Python function automatically at some interval you specify.

Admittedly this article is a bit long, but I wanted to make sure it would be easily reproducible, especially for beginners.

If you ever get stuck feel free to reach out to me and we can try to debug it.

A final note is everything here should be well within free tier, so it won’t cost you a penny.

Lambda gives you a million free requests a month, so unless you plan on firing this baby every second you should be good.

Step One: Create A Free AWS AccountHead over to https://aws.

amazon.

com and signup for a free account.

Note you will need to verify your phone number, but the whole process should only take a few minutes.

At the end select the “Basic” plan, which gives you a solid amount of compute power for free for 12 months.

Step Two: Create A Bucket To Store Our DataCreate An S3 BucketS3 Buckets are a place we can store data (or object data to be more specific).

It’s basically like a folder on your computer, but in the cloud!.Go to Services > Storage > S3 and you will be brought to the S3 splash screen.

Click “Create bucket” and give your bucket a name and pick the region closest to you.

I named by bucket “my-super-sweet-bucket”, and you can name yours whatever you want.

Just know the bucket names have to be unique globally, so you can’t have my awesome bucket name.

After giving it the name and region just stick with all the defaults (click Next until your bucket is created).

This will create your bucket, and make a note of what you called it as we will need it later.

Step Three: Create A Lambda FunctionNow that we have a bucket to store our data from Lambda, it’s time to create our Lambda function.

If you hover over Services at the top and under Compute you will find Lambda.

Lambda is found under computer (common AWS Cloud Practitioner exam question)Since you haven’t made a Lambda yet you will get a splash screen introducing you to the world of Lambda.

Click the “Create a Function” button to make your Lambda.

Next you will be asked to name your function, specify a programming language, and give it a role.

Create your Lambda functionQuick Aside — What is a Role?.Amazon Web Services is a vast and mysterious beast, with a lot of things going on.

One thing worth noting is nearly everything in AWS uses a rule of “Least Permissions”, which basically says by default nothing gets permissions to access anything.

For example, let’s say you create a user named Bob in your AWS Network.

By default, when Bob logs in, he can’t access any AWS service.

Bob can login but he can’t use S3, EC2, Lambda or anything else.

Poor Bob.

Unless it is explicitly stated then no access is given.

Anyways, a role is a way of programmatically granting access.

Basically it will tell AWS that “This is my Lambda function and I want it to be able to access S3”.

Without this role your Lambda function will be just like Poor Bob with no access.

Don’t make your Lambda like Bob.

Details for your Lambda functionAnyways, you can name your Lambda function whatever you want.

I named mine creatively “pullCraigslistAds”.

Since it’s 2019, and we are modern men and women, we will be using Python 3.

7 and not any of that 2.

7 nonsense.

Note that Lambda can be used with a lot of othe programming languages, but here we will use Python.

For now Permissions can be set to “Create a new role with basic Lambda permissions”.

We fix our role later in the tutorial.

Step 4: Choose Upload a .

zip file for your codeSet your Lambda to take a Zip fileThe whole point of Lambda is to have it execute some code for us.

Once you’ve created your Lambda function you will notice there is a code editor right there in the browser.

Often times youcan just write code right there in the editor.

However, in this tutorial we are going to use some outside (not built into Python) libraries, like Requests, we will need to do things a bit different by uploading a zip file.

If you don’t need to import any libraries that don’t come with Python you can just write it inline.

If you do need to import libraries we will need the Zip file.

That is, Lambda can run your Python code no problem-o but what if you don’t need libraries like Pandas, Requests, Matplotlib or anything else you installed with Pip.

In this tutorial we will be using the python-craigslist library, so we go with the “Upload .

zip file” option.

Step 5: Our Function’s CodeNote: If you are in a rush and want to skip some steps you can clone this repository that will provide you the final Zip file you need to give to Lambda.

If you just clone the repo you can jump to Step 8.

If you aren’t lazy and want to follow along then create a new folder anywhere on your machine and save the code below into a file called “lambda_function.

py”.

What is this doing — Basically it is using a Python library called craigslist to scrape Craigslist for us.

The first few lines import our libraries.

Line 7 is where we define our function, giving it the input of event and context.

If you want to get nerdy and read into it, check out the AWS Docs.

At a high level, event passes the function meta data about the trigger and context is something that passes runtime info to the function.

For this function we don’t need to mess with them much, but in the next part of this series we will need to use event.

For example, if we trigger a Lambda when something is put into an S3 bucket we can use event to get that file name, which is often handy.

Anyways, the rest of the function is fairly straightforward for those who have used Python: instantiate our class, tell it what data to pull, and store that in some JSON.

Lines 28 to 31 are how we send that data to S3 using Boto3.

If you’ve never used Boto3, it is a Python SDK, or in plain English it is how you can interact with AWS via Python.

Boto3 lets you put stuff in S3, invoke a Lambda, create a bucket, etc.

If you want to use AWS resources from a Python script than Boto3 is your answer.

Here we use one of its most basic features: output to our S3 bucket.

On line 29 you will need to update the bucket name to whatever you named your bucket in Step 2.

For reference, below is some sample data from our function, which is just data scraped from a Craigslist ad for apartments.

Sample dataA final note is that this is just our sample code, but this could be anything.

For example, it could call an API, dump that data to JSON and store that JSON in a NoSQL database.

Here we are outputting JSON to an S3 bucket but this could be many things.

Step 6: Pip Install Our DependenciesLambda runs Python in the cloud for you, but what does its version of Python have?.Anyone who uses Python regularly is highly dependent on libraries, which are open source pre-written blobs of code we can use (“import pandas as pd”, anyone)?.As an example, what happens in our Lambda (if I go back to inline) if I import requests?Import requests library to our Lambda functionNote: Be sure to save your function before clicking test, or else it is testing the previous version of the code without requests included.

A good rule of thumb is to always be saving your Lambda function, it’s easy to forget to do it and then you are debugging the past version.

Fail!.Lambda won’t come with external librariesBasically our Lambda fails because this Python environment doesn’t have the requests library.

So how do we get it (or any library)?.There are several ways, but this is how I’ve found it easiest.

We already have our lambda_function.

py file saved in a folder.

Next pip install your dependencies into that same folder.

Looking back at our code, the only dependency we have is the craigslist library (json, datetime, os and boto3 are the other libraries required and all are builtin so Lambda already has them).

Until recently I had no idea you could do this to pip install to your current directory, but it works quite nicely:Pip install the packages to your current directoryThis will install python-craigslist (and its dependencies) into your folder along with your lambda_function.

pyStep 7: Create A Zip FileThinking back to our Lambda function on AWS, we chose the option to upload a Zip file.

Now, my friends, is the time to make the aforementioned Zip file!.Select everything in your folder and put it into a zip file, naming it lambda_function.

zip.

I use 7Zip so I literally just highlight all the files and folders, right click, choose 7Zip and add it to a Zip file.

Then I rename it to lambda_function.

zip.

Your directory should now look something like below, although now all you need is that Zip file.

Your current directory after creating the Zip fileStep 8: Upload your Zip file to LambdaTime to upload our Zip fileFirst, I can feel you judging me for using Windows and I don’t like it.

Anyways, click the Upload button and select your newly created zip file.

Once you’ve done, that click the “Save” button in the top right.

This will bring your lambda_function.

py into the inline editor and your Lambda environment will now have all your needed libraries!.Also from this point forward if you need to make changes to the code you can just edit it in line instead of having to repackage your zip file.

The only reason you would need to repackage the zip file is if you have a new dependency on another library.

If that happens then repeat Steps 6–8.

Step 9: Create Your Environmental VariableGoing fully into environmental variables is a bit beyond the scope of this already lengthy tutorial, but basically you can do them in Lambda.

If you look at our code, we call for an environmental variable “number_of_posts”, which is how many posts we will pull.

Let’s be nice to Craigslist and not overload their system, so just set it to something low like 5 or 10.

This creates the environmental variable that our code will go out and get when it is invoked.

Step 10: Set your timeout valueSet your max compute time for Lambda to 3 minutesIncoming AWS Developers Associate exam question!.By default a Lambda will allow 3 seconds to execute.

For this example that simply isn’t enough time.

You can change this timeout value under the “Basic Settings”.

The max is 15 minutes, but we should be able to get away with 3 minutes.

If you were mean and set your function to pull a lot of ads, say 200, you would need more time.

Note that Lambda charges you for memory and compute time.

Step 10: Add Permission For Your Lambda FunctionI’m sure at this point this article is feeling length like a Jane Austen novel, but think back to Step 3 where we created our Lambda function.

I let you know that we would need to give Lambda some permissions to put stuff in an S3 bucket.

We’ve arrived at that time now.

Here’s how we do this (as concisely as possible):Go to Services > Security, Identity, & Compliance > IAMGo to Roles on the left hand navigationSelect RolesClick create roleUnder “Choose the service that will use this role” choose Lambda and click “Next: Permissions”Specify we are creating a role for LambdaNow you get to assign a Policy.

At a high level a Policy is how you give permissions.

We are creating a Role and in order for a Role to be able to do anything it needs a Policy attached that defines the permissions.

That is, a role on its own doesn’t grant any permissions.

It’s not untl you attach a Policy that it can access resources.

Luckily AWS has a bunch of Policies already made, so search S3 at the top and choose “AmazonS3FullAccess”, click the check box, and click Next.

Click the checkbox next to AmazonS3FullAccess and click NextDon’t add any Tags and click onward (this tutorial is long as is but Tags allow you to apply metadata).

In the final step we name our Role.

I am going to call mine LambdaS3AllAccess, but you can name yours whatever you want.

Click the create role button, and BAM, we have a role.

Go back to your Lambda function.

Under “Execution Role” select your new roleApply our spiffy new role to our Lambda functionSave your Lambda functionStep 11: Test Your FunctionThe default test should be fine, just give it a nameFinally our Lambda is ready to be tested!.So how do we test to see if this article hasn’t been a bunch of baloney and this stuff actually works?.Luckily Lambda has a test function built right in!. More details

Leave a Reply