This is more of a note to self as much as a blog post, as well as I would like to get blogging again.
Lambda integration vs. Lambda-Proxy
We just went through the process of setting up an AWS API proxy for lambda again today, and API Gateway seems to have changed a lot over the last little bit.
One of the new Integration Request options is “Lambda Proxy” which basically just sends the full request to Lambda (including the URL info) as opposed to just invoking the function.
if key isNone: log.warn('Creating new key for {}'.format(site_name)) key = bucket.new_key('{}/endpoint.json'.format(site_name)) key.content_type = 'application/json'
At work we’ve been looking for a good solution to consolidate our service layer. We have around 30-40 backing services for our application and web tiers. The problem with the current setup is that they’re deployed all over the place; some in EC2-land, and some are colocated, and they’re in various languages.
We’re working on consolidating our services into a consolidated architecture:
service stack (Python/WSGI)
load balancer
service cache
service proxy
…or something like that. I’m primarily interested in the first item today: service stack.
The problem with EC2
We absolutely love Amazon Web Services. Their services remove a lot of the headache from our day-to-day life. However, EC2 instances get pricey, especially when deploying a lot of services. Even if we put all of our services on their own micro instances (which isn’t a great idea) the hard cost is pretty clear… let’s assume 50 services at today’s prices:
50 t1.micro instances x ~$14/month = $700/month
This doesn’t seem like an astounding number, but to be fair, this probably isn’t an accurate estimate; we’re still forgetting high availability and load balancing, as well as snapshots. Assuming a t1.micro is a great option for all services, let’s factor in the new costs:
(50 t1.micro instances x 2 availability zones x $14/month) + (50 elastic load balancers x $14/month) = $2100/month
Having a high availability setup is starting to add up a bit, and there are still additional costs for EBS, snapshots, S3 (if you’re using it), and data transfer.
The case for Deis
Deis is a platform-as-a-service (Paas) much like Heroku or Elastic Beanstalk. There are a few other open source solutions out there like Flynn.io (which is at the time of writing still in alpha) and dokku. Dokku will only run on a single instance, which doesn’t make it quite as appealing as Deis.
The whole idea behind Deis is that you can simple git push your application code to the cluster controller, and the platform will take care of the rest, including providing the hostname and options to scale. The solution is based on Docker which makes it extra appealing.
Enter AWS EC2
There is a contrib setup for EC2 in the source repository, and it looks like it should work out of the box. It doesn’t currently appear to have support for AWS VPC just yet, but it is just using CloudFormation stacks behind the scenes, so the configuration should be pretty straight forward. Now, I don’t know about anyone else, but CloudFormation templates (example make my brain hurt. AWS’s documentation is pretty clear for the most part, but sometimes it seems like you need a very specific configuration in about 6 different places for something to work. VPC is very much like this.
Fortunately, someone has already tackled this and there is a pull request open to merge it into deis trunk. The pull request effectively adds a couple configuration values that should allow deis to operate in a VPC. Primarily, there are a couple environment variables you need to set before provisioning the cluster:
I did run into one problem with this, however, when I specified the environment variables, I found my CloudFormation stack creation was failing (you can find this in your AWS console under the CloudFormation section, click on a stack, and select the ‘Events’ tab below). The error I was getting was related to the VPC and AutoScalingGroup not being created in the same availability zone, so I had to tweak the CloudFormation template under the Resources:CoreOSServerAutoScale:Properties:AvailabilityZones key to reflect my AZ (namely us-west-2b).
The IGW
Another problem I ran into (that had nothing to do with Deis) was I was deploying to an availability zone that didn’t actually have access to the public internet (no gateway/nat). This manifested as an error when trying to make run the cluster:
1
Failed creating job deis-registry.service: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
Once I got into the proper availability zone, the problem went away.
make run
I ran into some more issues when I finally got make run-ning…first the “activation” part of it took a really long time, like 15 minutes. I finally got this error:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
af:deis aaronfay$ make run fleetctl --strict-host-key-checking=false submit registry/systemd/deis-registry.service logger/systemd/deis-logger.service cache/systemd/deis-cache.service database/systemd/deis-database.service Starting 1 router(s)... Job deis-router.1.service scheduled to 22e48bb9.../10.0.14.17 Starting Deis! Deis will be functional once all services are reported as running... fleetctl --strict-host-key-checking=false start registry/systemd/deis-registry.service logger/systemd/deis-logger.service cache/systemd/deis-cache.service database/systemd/deis-database.service Job deis-registry.service scheduled to 4cb60f67.../10.0.14.16 Job deis-logger.service scheduled to 22e48bb9.../10.0.14.17 Job deis-database.service scheduled to 4cb60f67.../10.0.14.16 Job deis-cache.service scheduled to bc34904c.../10.0.14.13 Waiting for deis-registry to start (this can take some time)... Failed initializing SSH client: Timed out while initiating SSH connection Status: Failed initializing SSH client: Timed out while initiating SSH connection Failed initializing SSH client: Timed out while initiating SSH connection One or more services failed! Check which services by running 'make status' You can get detailed output with 'fleetctl status deis-servicename.service' This usually indicates an error with Deis - please open an issue on GitHub or ask for help in IRC
One of the admins on the IRC channel for #deis mentioned that you can make run again with no problems, however after several runs I still couldn’t get the command to complete free from errors. make status pointed out the issue with the controller:
1 2 3 4 5 6 7 8 9
af:deis aaronfay$ make status fleetctl --strict-host-key-checking=false list-units UNIT LOAD ACTIVE SUB DESC MACHINE deis-cache.service loaded active running deis-cache bc34904c.../10.0.14.13 deis-controller.service loaded failed failed deis-controller 22e48bb9.../10.0.14.17 deis-database.service loaded active running deis-database 4cb60f67.../10.0.14.16 deis-logger.service loaded active running deis-logger 22e48bb9.../10.0.14.17 deis-registry.service loaded active running deis-registry 4cb60f67.../10.0.14.16 deis-router.1.service loaded active running deis-router 22e48bb9.../10.0.14.17
failed hey? The same admin on the channel recommended fleetctl start deis-controller although I think I’m using an older version of fleetctl (0.2.0?) and I had to actually run fleetctl start deis-controller.service. That appears to have worked:
1 2 3 4 5 6 7 8 9 10 11 12
af:deis aaronfay$ fleetctl start deis-controller.service Job deis-controller.service scheduled to 22e48bb9.../10.0.14.17
af:deis aaronfay$ make status fleetctl --strict-host-key-checking=false list-units UNIT LOAD ACTIVE SUB DESC MACHINE deis-cache.service loaded active running deis-cache bc34904c.../10.0.14.13 deis-controller.service loaded active running deis-controller 22e48bb9.../10.0.14.17 deis-database.service loaded active running deis-database 4cb60f67.../10.0.14.16 deis-logger.service loaded active running deis-logger 22e48bb9.../10.0.14.17 deis-registry.service loaded active running deis-registry 4cb60f67.../10.0.14.16 deis-router.1.service loaded active running deis-router 22e48bb9.../10.0.14.17
So far so good
I now have Deis running in the VPC, well, the first bits anyway. I will update with the second part which includes DNS configuration, initializing a cluster, and deploying an app.
I have need to migrate some sqlite3 databases to mysql for a couple django-cms projects at work. Typically in the past I’ve used fixtures in Django to get this done, but I usually have to align the planets and concoct the elixir of life to get the fixtures to migrate properly. It usually has to do with foreign key errors something. This is something that should just be easy, but in my experience with Django, it never is.
defmain(): print"SET sql_mode='NO_BACKSLASH_ESCAPES';" for line in sys.stdin: processLine(line)
defprocessLine(line): if ( line.startswith("PRAGMA") or line.startswith("BEGIN TRANSACTION;") or line.startswith("COMMIT;") or line.startswith("DELETE FROM sqlite_sequence;") or line.startswith("INSERT INTO \"sqlite_sequence\"") ): return line = line.replace("AUTOINCREMENT", "AUTO_INCREMENT") line = line.replace("DEFAULT 't'", "DEFAULT '1'") line = line.replace("DEFAULT 'f'", "DEFAULT '0'") line = line.replace(",'t'", ",'1'") line = line.replace(",'f'", ",'0'") in_string = False newLine = '' for c in line: ifnot in_string: if c == "'": in_string = True elif c == '"': newLine = newLine + '`' continue elif c == "'": in_string = False newLine = newLine + c print newLine
This post is a bit of a ‘note to self’. I am tinkering with Vagrant boxes today trying to flesh out some ansible and I need to get the boxes to talk to each other locally. I know about the vagrant multi-machine setup, but I was already partly committed to having 2 individual boxes set up before I discovered it.
So, the trick is, set the network configuration in your Vagrantfile to “private_network”:
With the IPs set differently it seems to work, and the host is accessible as well. Note that my host subnet is 192.168.1.x. Probably not the right way, but it works for now.
One of the things I enjoy about building projects with nodejs is using npm, specifically the devDependencies part of package.json. This allows you to have one set of dependencies that are installed in production, but have extra dependencies installed for development, such as test libraries, deploy tools, etc. To get the development dependencies with npm you run:
1
$ npm intall --dev
how about pip
It turns out if you are using pip 1.2 or newer, you can now do the same thing in your setup.py file for Python packages.
“…why package two modules together if you can simply break them apart into two kernels of functionality which are codependent?””
Problem
One of the core sore points for me right now is the existence of “common” libraries in our work. It’s common to have a piece of code that is needed in the current project, but doesn’t particularly belong there. The approach (I often see) is to create said “common” library and deploy that with all of the projects that need the code. The major resistance to putting this in an individual package is probably the overhead of maintaining a separate repository for the individual code, along with the pull/commit/push/tag/release cycle that comes with it to make changes to a potentially developing module. So in the end, we end up with the “common” library.
The problem with is many-fold though:
dependency chains are not explicit,
the “common” library grows over time,
the same library becomes disorganized,
it’s not clear later on how to break things out because it’s not clear what projects are using what parts of the library,
Back to the node.js philosophy, if you’ve ever used npm before, you know that there are tons and tons of modules available for node (as an interesting sidenode, npmjs module counts are growing by 94 modules/day at the time of writing [link]). The recommended approach is to keep modules small, and publish them independently so they can be used explicitly across applications. James Halliday writes about this approach on his blog.
I’ve also recently set up a mypi private package index for our work, so we can start moving towards small, reusable python packages. I’ve also looked at djangopypi and djangopypi2, the latter being a bootstrap-converted fork of the former. Both these projects seem to add a little more functionality around users management, and of course they’re built on Django, which means you get the nice Django admin at the same time. I haven’t had time to do a full comparison, that will have to come later. For the time being, mypi seems to do the trick nicely.
Where setuptools falls apart
Turns out, using pip, you can just specify a custom index in your ~/.pip/pip.conf and then pip install <packagename> and you’re good to go. That’s fine for installing one-off modules, however, automating the entire depenedency installation process wasn’t obvious at first.
Setuptools fail
My scenario had 2 projects, Project A and Project B. Project A relies on custom packages in my mypi index, and is published to the package also. Project B has a single dependency on Project A. Using setuptoolspython setup.py install would find Project A in the private package index (via dependency_links), but none of Project A‘s custom index dependencies were being found, despite having specified the dependency_links in that project.
“Internally, pip uses the setuptools package, and the pkg_resources module, which are available from the project, Setuptools.”
Turns out pip spits out the setuptools configuration (whatever you have in your setup.py) into a /<project-name>.egg-info/ folder, includingdependency_links.
To get the pip equivalent of python setup.py develop just run:
1 2
# -e means 'edit' $ pip install -e .
To get the same for python setup.py install run:
1
$ pip install .
The super-cool thing about this is that dependency_links no longer need to be set in the setup.py files as pip will use the custom index set up in the ~/.pip/pip.conf file.
Done and done
I think this solution will solve some of the problem of having all the git/Github overhead involved in releases. With a simple fab setup, release candidates and formal releases can be incremented and deployed in a way that feels a little more clean and independent of the git workflow, while still maintaining source control. I’m hoping it will promote users to push modules early in a ‘sharable’ way to the private index so they can be easily installed for others. All in all, it feels cleaner to do it this way for me.
Hope that helps someone else down the road. Now we have a nice private registry for our python packages, and an easy way to automate their installation.
Note It appears that djangopypi is actually maintained by Disqus, that may make it a good reason to use the project, as it will probably be maintained for a longer period. I will explore that option and write up a comparison later.
develop like the code is going to change, make sure it’s changeable, test it, refactor it, fail early, experiment and request feedback
deploy like it’s going to change, plan for scaling (even if you don’t have to), modularize, and automate whenever possible
create infrastructure like it’s going to change, adopt methods that let you pull out pieces and put in new ones.
Your setup is going to change, program like that’s the only thing that will remain constant.
On Testing
Testing is not about finding bugs. It’s not even really about making sure a piece of code works, you could do that yourself. Testing is to make sure you can change it later.
On projects
When you set up a project, if you have no built-in or prescribed way of specifying the dependencies of the project, you’re going to find yourself pulling your hair out down the road. You are going to have to wipe it out and start from scratch at one point, or even if you don’t, you still need to replicate the setup for stage, and production, and another developer. Save yourself some trouble, use the package management system or your language, or the one prescribed by the community. Setuptools for Python, npm for node, gem for ruby. Manage your own packages with these. It will save you mondo time down the road. If your language doesn’t have package management, bail, find a language that does.
and a rant
If you’re using Perl, use cpan minus, cpan is a pain, cpanm removes some of the pain. If you’re using php, well, god help you.
Today I ran across an interesting feature of Perl, as I was trying to test a variable key against a hash, essentially a key search. I’m still surprised at how accessible things are in Perl when I go to do something complicated, I find there is usually a very simple way to do it, clearly by design.
An Example
Searching against a set of keys in perl is quite simple, given the following hash:
For a bit of code that I’m working on, I need a little more complicated search though, I’m testing a dynamic string against the set of keys for a given hash, turns out the answer is equally simple:
Pretty straight forward. This example doesn’t really give a lot of power just yet, mostly because the regular expressions are over-simplified. It was in my next example that the nature of list and scalar context started to become clear to me in Perl. According to this page on the perldoc site, “[grep] returns the list value consisting of those elements for which the expression evaluated to true. In scalar context, returns the number of times the expression was true.”
This is neat. In the first example below, we get the match count, and the second, we get a list of matches from the same grep. The only difference is the context assignment, denoted by $ or @.
1 2 3 4 5 6 7
# get results in scalar context $moo = grep {/^b/} keys %foo; print $moo; # 3
# get results in list context @moo = grep {/^b/} keys %foo; print @moo; # batbarbiz
Scalar wut?
This article describes a scalar as the “most basic kind of variable in Perl… [they] hold both strings and numbers, and are remarkable in that strings and numbers are completely interchangable.” Maybe remarkable if you are coming from C, or a strict language where dynamic typing is not allowed.
A little deeper
A scalar turns out to be just a variable, but one that can only hold a single value. If you need more than one value, you need a list, and a lookup, you need a hash. Scalar variables in Perl can be a number, a string, even a reference to a more complicated type. In the the case of an array reference or a hash reference, you can think of the scalar variable as a pointer to the original object, which can be merrily passed around at will.
1 2 3 4
%hash = ('foo'=>'bar'); # my hash $ref = \%hash; # $ref is a hash reference
$obj->my_sub($ref); # pass the reference to a subroutine
This language is wierd. Larry Wall must have thought he build the best thing ever when he created Perl. If feels a lot like vim, except that you have about 6-10 possible ways to do any given command. TIMTOWTDI is a great mindset for a language in terms of flexibility, but as soon as you get into someone else’s code, you realize you don’t know anything about the language again.
I want to start off by saying that Perl has to be one of the most fantastic languages I’ve every had to work in. I’m not really learning it because I have a keen interest in Perl, I’m learning to be helpful regarding the legacy codebase at my work.
A little grind goes a long way…
I wrote a bit of a script, after spending a few weeks perusing through a hefty codebase, and even with a little bit of Programming Perl under my belt, I still don’t have the skill to just roll out a lot of code off the top of my head. To make sure I was putting some test coverage in place (of which there isn’t in this particular project), I looked up Test::Simple and Test::More and started the file that would house my tests.
I found after I have covered the existing code that I was looking at my new function stubs, and wondering how to best describe what they were going to do. Without even really thinking about it, I started writing tests to say, “It should do this, or that” and in a couple minutes I had created a spec for the function I was writing.
Almost like fun
The neat thing is, having the spec in place allowed me to play with the code a little bit to see if it was doing what I wanted when I tried different things. If you recall, Perl has that “There Is Moar Than One Way To Do It”(TM) thing, which can be a good and a bad thing, but more about that later.
The real fun is when I made the realization that I was actually doing Test Driven Development to learn Perl. TDD is something I’ve always thought would benefit my coding style, but I never really realized how until today.