AWS API gateway and Lambda Proxy integration

This is more of a note to self as much as a blog post, as well as I would
like to get blogging again.

Lambda integration vs. Lambda-Proxy

We just went through the process of setting up an AWS API proxy for lambda again
today, and API Gateway seems to have changed a lot over the last little bit.

One of the new Integration Request options is “Lambda Proxy” which basically
just sends the full request to Lambda (including the URL info) as opposed to
just invoking the function.

The input would look something like (docs):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
{
"message": "Hello me!",
"input": {
"resource": "/{proxy+}",
"path": "/hello/world",
"httpMethod": "POST",
"headers": {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"cache-control": "no-cache",
"CloudFront-Forwarded-Proto": "https",
"CloudFront-Is-Desktop-Viewer": "true",
"CloudFront-Is-Mobile-Viewer": "false",
"CloudFront-Is-SmartTV-Viewer": "false",
"CloudFront-Is-Tablet-Viewer": "false",
"CloudFront-Viewer-Country": "US",
"Content-Type": "application/json",
"headerName": "headerValue",
"Host": "gy415nuibc.execute-api.us-east-1.amazonaws.com",
"Postman-Token": "9f583ef0-ed83-4a38-aef3-eb9ce3f7a57f",
"User-Agent": "PostmanRuntime/2.4.5",
"Via": "1.1 d98420743a69852491bbdea73f7680bd.cloudfront.net (CloudFront)",
"X-Amz-Cf-Id": "pn-PWIJc6thYnZm5P0NMgOUglL1DYtl0gdeJky8tqsg8iS_sgsKD1A==",
"X-Forwarded-For": "54.240.196.186, 54.182.214.83",
"X-Forwarded-Port": "443",
"X-Forwarded-Proto": "https"
},
"queryStringParameters": {
"name": "me"
},
"pathParameters": {
"proxy": "hello/world"
},
"stageVariables": {
"stageVariableName": "stageVariableValue"
},
"requestContext": {
"accountId": "12345678912",
"resourceId": "roq9wj",
"stage": "testStage",
"requestId": "deef4878-7910-11e6-8f14-25afc3e9ae33",
"identity": {
"cognitoIdentityPoolId": null,
"accountId": null,
"cognitoIdentityId": null,
"caller": null,
"apiKey": null,
"sourceIp": "192.168.196.186",
"cognitoAuthenticationType": null,
"cognitoAuthenticationProvider": null,
"userArn": null,
"userAgent": "PostmanRuntime/2.4.5",
"user": null
},
"resourcePath": "/{proxy+}",
"httpMethod": "POST",
"apiId": "gy415nuibc"
},
"body": "{\r\n\t\"a\": 1\r\n}"
}
}

The response should look something like (docs):

1
2
3
4
5
{
"statusCode": httpStatusCode,
"headers": { "headerName": "headerValue", ... },
"body": "..."
}

Updates mentioned in the AWS blog. Cheers.

S3 set_contents_from_string snippet

I’m putting this here because I need it every time and I have to go looking on the internet for it. It does 3 things:

  • puts a string into a key on S3
  • sets the ACL to public
  • sets the type to application/json so it can be consumed like a service call
s3.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

import boto
import logging

logging.basicConfig(level=DEBUG)
log = logging.getLogger()


s3 = boto.connect_s3()
bucket = s3.get_bucket('my-fake-s3-service')
key = bucket.get_key('{}/endpoint.json'.format(site_name))

if key is None:
log.warn('Creating new key for {}'.format(site_name))
key = bucket.new_key('{}/endpoint.json'.format(site_name))
key.content_type = 'application/json'

key.set_contents_from_string(json.dumps(struct),
replace=True,
policy='public-read')

Now the next time I need this, hopefully I remember to find it here.

Experimenting with Deis on AWS EC2 VPC

Note: this article deals with Deis as of 0.9.0.

At work we’ve been looking for a good solution to consolidate our service layer. We have around
30-40 backing services for our application and web tiers. The problem with the current setup is that they’re deployed
all over the place; some in EC2-land, and some are colocated, and they’re in various languages.

We’re working on consolidating our services into a consolidated architecture:

  • service stack (Python/WSGI)
  • load balancer
  • service cache
  • service proxy

…or something like that. I’m primarily interested in the first item today: service stack.

The problem with EC2

We absolutely love Amazon Web Services. Their services remove a lot of the headache from our day-to-day life. However,
EC2 instances get pricey, especially when deploying a lot of services. Even if we put all of our services on their own
micro instances (which isn’t a great idea) the hard cost is pretty clear… let’s assume 50 services at today’s prices:

  • 50 t1.micro instances x ~$14/month = $700/month

This doesn’t seem like an astounding number, but to be fair, this probably isn’t an accurate estimate;
we’re still forgetting high availability and load balancing, as well as snapshots. Assuming a t1.micro is a great
option for all services, let’s factor in the new costs:

  • (50 t1.micro instances x 2 availability zones x $14/month) + (50 elastic load balancers x $14/month) = $2100/month

Having a high availability setup is starting to add up a bit, and there are still additional costs for EBS, snapshots, S3
(if you’re using it), and data transfer.

The case for Deis

Deis is a platform-as-a-service (Paas) much like
Heroku or Elastic Beanstalk. There are a few other open
source solutions out there like Flynn.io (which is at the time of writing still in alpha) and
dokku. Dokku will only run on a single instance, which doesn’t make it quite as
appealing as Deis.

The whole idea behind Deis is that you can simple git push your application code to the cluster controller, and the
platform will take care of the rest, including providing the hostname and options to scale. The solution is based on
Docker which makes it extra appealing.

Enter AWS EC2

There is a contrib setup for EC2 in the source repository, and
it looks like it should work out of the box. It doesn’t currently appear to have support for AWS VPC just yet, but it
is just using CloudFormation stacks behind the scenes, so the configuration should be pretty straight forward. Now,
I don’t know about anyone else, but CloudFormation templates (example
make my brain hurt. AWS’s documentation is pretty clear for the most part, but sometimes it seems like you need a
very specific configuration in about 6 different places for something to work. VPC is very much like this.

Fortunately, someone has already tackled this and there is a pull request
open to merge it into deis trunk. The pull request effectively adds a couple configuration values that should allow deis
to operate in a VPC. Primarily, there are a couple environment variables you need to set before provisioning the cluster:

1
2
export VPC_ID=vpc-a26218bf
export VPC_SUBNETS=subnet-04d7f942,subnet-2b03ab7f

I did run into one problem with this, however, when I specified the environment variables, I found my CloudFormation stack
creation was failing (you can find this in your AWS console under the CloudFormation section, click on a stack, and select
the ‘Events’ tab below). The error I was getting was related to the VPC and AutoScalingGroup not being created in the
same availability zone, so I had to tweak the CloudFormation template
under the Resources:CoreOSServerAutoScale:Properties:AvailabilityZones key to reflect my AZ (namely us-west-2b).

The IGW

Another problem I ran into (that had nothing to do with Deis) was I was deploying to an availability zone that didn’t actually have access to the public internet (no gateway/nat). This manifested as an error when trying to make run the cluster:

1
Failed creating job deis-registry.service: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Once I got into the proper availability zone, the problem went away.

make run

I ran into some more issues when I finally got make run-ning…first the “activation” part of it took a really long time, like 15 minutes. I finally got this error:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
af:deis aaronfay$ make run
fleetctl --strict-host-key-checking=false submit registry/systemd/deis-registry.service logger/systemd/deis-logger.service cache/systemd/deis-cache.service database/systemd/deis-database.service
Starting 1 router(s)...
Job deis-router.1.service scheduled to 22e48bb9.../10.0.14.17
Starting Deis! Deis will be functional once all services are reported as running...
fleetctl --strict-host-key-checking=false start registry/systemd/deis-registry.service logger/systemd/deis-logger.service cache/systemd/deis-cache.service database/systemd/deis-database.service
Job deis-registry.service scheduled to 4cb60f67.../10.0.14.16
Job deis-logger.service scheduled to 22e48bb9.../10.0.14.17
Job deis-database.service scheduled to 4cb60f67.../10.0.14.16
Job deis-cache.service scheduled to bc34904c.../10.0.14.13
Waiting for deis-registry to start (this can take some time)...
Failed initializing SSH client: Timed out while initiating SSH connection
Status: Failed initializing SSH client: Timed out while initiating SSH connection
Failed initializing SSH client: Timed out while initiating SSH connection
One or more services failed! Check which services by running 'make status'
You can get detailed output with 'fleetctl status deis-servicename.service'
This usually indicates an error with Deis - please open an issue on GitHub or ask for help in IRC

One of the admins on the IRC channel for #deis mentioned that you can make run again with no problems, however after several runs I still couldn’t get the command to complete free from errors. make status pointed out the issue with the controller:

1
2
3
4
5
6
7
8
9
af:deis aaronfay$ make status
fleetctl --strict-host-key-checking=false list-units
UNIT LOAD ACTIVE SUB DESC MACHINE
deis-cache.service loaded active running deis-cache bc34904c.../10.0.14.13
deis-controller.service loaded failed failed deis-controller 22e48bb9.../10.0.14.17
deis-database.service loaded active running deis-database 4cb60f67.../10.0.14.16
deis-logger.service loaded active running deis-logger 22e48bb9.../10.0.14.17
deis-registry.service loaded active running deis-registry 4cb60f67.../10.0.14.16
deis-router.1.service loaded active running deis-router 22e48bb9.../10.0.14.17

failed hey? The same admin on the channel recommended fleetctl start deis-controller although I think I’m using an older version of fleetctl (0.2.0?) and I had to actually run fleetctl start deis-controller.service. That appears to have worked:

1
2
3
4
5
6
7
8
9
10
11
12
af:deis aaronfay$ fleetctl start deis-controller.service
Job deis-controller.service scheduled to 22e48bb9.../10.0.14.17

af:deis aaronfay$ make status
fleetctl --strict-host-key-checking=false list-units
UNIT LOAD ACTIVE SUB DESC MACHINE
deis-cache.service loaded active running deis-cache bc34904c.../10.0.14.13
deis-controller.service loaded active running deis-controller 22e48bb9.../10.0.14.17
deis-database.service loaded active running deis-database 4cb60f67.../10.0.14.16
deis-logger.service loaded active running deis-logger 22e48bb9.../10.0.14.17
deis-registry.service loaded active running deis-registry 4cb60f67.../10.0.14.16
deis-router.1.service loaded active running deis-router 22e48bb9.../10.0.14.17

So far so good

I now have Deis running in the VPC, well, the first bits anyway. I will update with the second part which includes DNS configuration, initializing a cluster, and deploying an app.

Cheers,

Converting a SQLite3 database to MySQL

I have need to migrate some sqlite3 databases to mysql for a couple django-cms projects at work. Typically in the past
I’ve used fixtures in Django to get this done, but I usually have to align the planets and concoct the elixir of
life to get the fixtures to migrate properly. It usually has to do with foreign key errors something. This is something
that should just be easy, but in my experience with Django, it never is.

Theres’s a couple posts on stackoverflow with various scripts to convert content from sqlite to mysql, but none of
them lined up the planets just right.

Then I happened on this page where there’s a python script by this guy that is not based on the others. And it just worked.
The script looks like:

convert.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#! /usr/bin/env python

import sys

def main():
print "SET sql_mode='NO_BACKSLASH_ESCAPES';"
for line in sys.stdin:
processLine(line)

def processLine(line):
if (
line.startswith("PRAGMA") or
line.startswith("BEGIN TRANSACTION;") or
line.startswith("COMMIT;") or
line.startswith("DELETE FROM sqlite_sequence;") or
line.startswith("INSERT INTO \"sqlite_sequence\"")
):
return
line = line.replace("AUTOINCREMENT", "AUTO_INCREMENT")
line = line.replace("DEFAULT 't'", "DEFAULT '1'")
line = line.replace("DEFAULT 'f'", "DEFAULT '0'")
line = line.replace(",'t'", ",'1'")
line = line.replace(",'f'", ",'0'")
in_string = False
newLine = ''
for c in line:
if not in_string:
if c == "'":
in_string = True
elif c == '"':
newLine = newLine + '`'
continue
elif c == "'":
in_string = False
newLine = newLine + c
print newLine

if __name__ == "__main__":
main()

Usage

1
$ sqlite3 mydb.sqlite .dump | python convert.py > out.sql

Thank you Behrang Noroozinia from internet land, you solved a long-standing problem of mine.

Networked local vagrant boxes for automation testing

This post is a bit of a ‘note to self’. I am tinkering with Vagrant boxes today trying to flesh out some ansible
and I need to get the boxes to talk to each other locally. I know about the vagrant multi-machine setup, but I was
already partly committed to having 2 individual boxes set up before I discovered it.

So, the trick is, set the network configuration in your Vagrantfile to “private_network”:

box-a
1
config.vm.network "private_network", ip: "192.168.2.4"
box-b
1
config.vm.network "private_network", ip: "192.168.2.5"

With the IPs set differently it seems to work, and the host is accessible as well. Note that my host subnet is 192.168.1.x.
Probably not the right way, but it works for now.

af

Dependency sets for pip

One of the things I enjoy about building projects with nodejs
is using npm, specifically the devDependencies part of
package.json. This allows you to have one set of dependencies that are
installed in production, but have extra dependencies installed for development,
such as test libraries, deploy tools, etc. To get the development dependencies
with npm you run:

1
$ npm intall --dev

how about pip

It turns out if you are using pip 1.2 or newer, you can now do the same thing
in your setup.py file for Python packages.

An example setup.py file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python

from setuptools import setup
from myproject import __version__

required = [
'gevent',
'flask',
...
]

extras = {
'develop': [
'Fabric',
'nose',
]
}

setup(
name="my-project",
version=__version__,
description="My awsome project.",
packages=[
"my_project"
],
include_package_data=True,
zip_safe=False,
scripts=[
'runmyproject',
],
install_requires=required,
extras_require=extras,
)

To install this normally (in “edit” mode) you’d run:

1
$ pip install -e .

To install the develop set of dependencies you can run:

1
$ pip install -e .[develop]

As you can see, you can have multiple sets of extra dependencies and call them
whatever you want.

Have fun,
Aaron

setuptools, pip, and custom python index

In modern-day software development for the web I find that we end up trying many different ways to deploy code. While at work we’re using python as our primary programming language, I’ve enjoyed the node.js philosophy, especially the practice of Small Kernels of Functionality and Loosely Coupled Components.

From the article

“…why package two modules together if you can simply break them apart into two kernels of functionality which are codependent?””

Problem

One of the core sore points for me right now is the existence of “common” libraries in our work. It’s common to have a piece of code that is needed in the current project, but doesn’t particularly belong there. The approach (I often see) is to create said “common” library and deploy that with all of the projects that need the code. The major resistance to putting this in an individual package is probably the overhead of maintaining a separate repository for the individual code, along with the pull/commit/push/tag/release cycle that comes with it to make changes to a potentially developing module. So in the end, we end up with the “common” library.

The problem with is many-fold though:

  • dependency chains are not explicit,
  • the “common” library grows over time,
  • the same library becomes disorganized,
  • it’s not clear later on how to break things out because it’s not clear what projects are using what parts of the library,
  • the library with all theses different pieces of functionality breaks the rule of single responsibility.

Back to the node.js philosophy, if you’ve ever used npm before, you know that there are tons and tons of modules available for node (as an interesting sidenode, npmjs module counts are growing by 94 modules/day at the time of writing [link]). The recommended approach is to keep modules small, and publish them independently so they can be used explicitly across applications. James Halliday writes about this approach on his blog.

Back to Python

Python has been criticized for having painful package management. At work, we currently use setuptools for installing packages from Github, and it does a pretty decent job. As I’ve written before you can specify dependency_links in the setup.py file to pull tarballs from any source control system that will provide them. Like I said, this works pretty well.

Mypi

I’ve also recently set up a mypi private package index for our work, so we can start moving towards small, reusable python packages. I’ve also looked at djangopypi and djangopypi2, the latter being a bootstrap-converted fork of the former. Both these projects seem to add a little more functionality around users management, and of course they’re built on Django, which means you get the nice Django admin at the same time. I haven’t had time to do a full comparison, that will have to come later. For the time being, mypi seems to do the trick nicely.

Where setuptools falls apart

Turns out, using pip, you can just specify a custom index in your ~/.pip/pip.conf and then pip install <packagename> and you’re good to go. That’s fine for installing one-off modules, however, automating the entire depenedency installation process wasn’t obvious at first.

Setuptools fail

My scenario had 2 projects, Project A and Project B. Project A relies on custom packages in my mypi index, and is published to the package also. Project B has a single dependency on Project A. Using setuptools python setup.py install would find Project A in the private package index (via dependency_links), but none of Project A‘s custom index dependencies were being found, despite having specified the dependency_links in that project.

Long story longer (and the answer)

The answer just turned out to be a little bit more understanding of the evolution of python package management, specifically this little tidbit about pip:

“Internally, pip uses the setuptools package, and the pkg_resources module, which are available from the project, Setuptools.”

Turns out pip spits out the setuptools configuration (whatever you have in your setup.py) into a /<project-name>.egg-info/ folder, including dependency_links.

To get the pip equivalent of python setup.py develop just run:

1
2
# -e means 'edit'
$ pip install -e .

To get the same for python setup.py install run:

1
$ pip install .

The super-cool thing about this is that dependency_links no longer need to be set in the setup.py files as pip will use the custom index set up in the ~/.pip/pip.conf file.

Done and done

I think this solution will solve some of the problem of having all the git/Github overhead involved in releases. With a simple fab setup, release candidates and formal releases can be incremented and deployed in a way that feels a little more clean and independent of the git workflow, while still maintaining source control. I’m hoping it will promote users to push modules early in a ‘sharable’ way to the private index so they can be easily installed for others. All in all, it feels cleaner to do it this way for me.

Hope that helps someone else down the road. Now we have a nice private registry for our python packages, and an easy way to automate their installation.

Note It appears that djangopypi is actually maintained by Disqus, that may make it a good reason to use the project, as it will probably be maintained for a longer period. I will explore that option and write up a comparison later.

On code and change...

Thoughts in general:

  • develop like the code is going to change, make sure it’s changeable, test it, refactor it, fail early, experiment and request feedback
  • deploy like it’s going to change, plan for scaling (even if you don’t have to), modularize, and automate whenever possible
  • create infrastructure like it’s going to change, adopt methods that let you pull out pieces and put in new ones.

Your setup is going to change, program like that’s the only thing that will remain constant.

On Testing

Testing is not about finding bugs. It’s not even really about making sure a piece of code works, you could do that yourself. Testing is to make sure you can change it later.

On projects

When you set up a project, if you have no built-in or prescribed way of specifying the dependencies of the project, you’re going to find yourself pulling your hair out down the road. You are going to have to wipe it out and start from scratch at one point, or even if you don’t, you still need to replicate the setup for stage, and production, and another developer. Save yourself some trouble, use the package management system or your language, or the one prescribed by the community. Setuptools for Python, npm for node, gem for ruby. Manage your own packages with these. It will save you mondo time down the road. If your language doesn’t have package management, bail, find a language that does.

and a rant

If you’re using Perl, use cpan minus, cpan is a pain, cpanm removes some of the pain. If you’re using php, well, god help you.

Learning Perl, List and Scalar context note

Today I ran across an interesting feature of Perl, as I was trying to test a variable key against a hash, essentially a key search. I’m still surprised at how accessible things are in Perl when I go to do something complicated, I find there is usually a very simple way to do it, clearly by design.

An Example

Searching against a set of keys in perl is quite simple, given the following hash:

1
2
3
4
5
%foo = (
'bar' =>'baz',
'biz'=>'bad',
'bat'=>'buz'
);

… testing for a key match is quite simple:

1
2
$match = grep {/bar/} keys %foo;
print $match; # prints '1'

Up the ante

For a bit of code that I’m working on, I need a little more complicated search though, I’m testing a dynamic string against the set of keys for a given hash, turns out the answer is equally simple:

1
2
3
$search_string = 'bar';
$match = grep {/$search_string/} keys %foo;
print $match; # prints '1'

Pretty straight forward. This example doesn’t really give a lot of power just yet, mostly because the regular expressions are over-simplified. It was in my next example that the nature of list and scalar context started to become clear to me in Perl. According to this page on the perldoc site, “[grep] returns the list value consisting of those elements for which the expression evaluated to true. In scalar context, returns the number of times the expression was true.”

This is neat. In the first example below, we get the match count, and the second, we get a list of matches from the same grep. The only difference is the context assignment, denoted by $ or @.

1
2
3
4
5
6
7
# get results in scalar context
$moo = grep {/^b/} keys %foo;
print $moo; # 3

# get results in list context
@moo = grep {/^b/} keys %foo;
print @moo; # batbarbiz

Scalar wut?

This article describes a scalar as the “most basic kind of variable in Perl… [they] hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangable.” Maybe remarkable if you are coming from C, or a strict language where dynamic typing is not allowed.

A little deeper

A scalar turns out to be just a variable, but one that can only hold a single value. If you need more than one value, you need a list, and a lookup, you need a hash. Scalar variables in Perl can be a number, a string, even a reference to a more complicated type. In the the case of an array reference or a hash reference, you can think of the scalar variable as a pointer to the original object, which can be merrily passed around at will.

1
2
3
4
%hash = ('foo'=>'bar'); # my hash
$ref = \%hash; # $ref is a hash reference

$obj->my_sub($ref); # pass the reference to a subroutine

This language is wierd. Larry Wall must have thought he build the best thing ever when he created Perl. If feels a lot like vim, except that you have about 6-10 possible ways to do any given command. TIMTOWTDI is a great mindset for a language in terms of flexibility, but as soon as you get into someone else’s code, you realize you don’t know anything about the language again.

Until next time,
Af

On learning Perl and TDD

I want to start off by saying that Perl has to be one of the most fantastic
languages I’ve every had to work in. I’m not really learning it because I have
a keen interest in Perl, I’m learning to be helpful regarding the legacy codebase
at my work.

A little grind goes a long way…

I wrote a bit of a script, after spending a few weeks perusing through a hefty
codebase, and even with a little bit of Programming Perl under my belt, I
still don’t have the skill to just roll out a lot of code off the top of my head. To
make sure I was putting some test coverage in place (of which there isn’t in
this particular project), I looked up Test::Simple and Test::More and started
the file that would house my tests.

I found after I have covered the existing code that I was looking at my new
function stubs, and wondering how to best describe what they were going to do. Without
even really thinking about it, I started writing tests to say, “It should
do this, or that” and in a couple minutes I had created a spec for the
function I was writing.

Almost like fun

The neat thing is, having the spec in place allowed me to play with the code a
little bit to see if it was doing what I wanted when I tried different things.
If you recall, Perl has that “There Is Moar Than One Way To Do It”(TM) thing,
which can be a good and a bad thing, but more about that later.

The real fun is when I made the realization that I was actually doing Test Driven Development
to learn Perl. TDD is something I’ve always thought would benefit my coding
style, but I never really realized how until today.

Until next time,
Af