daemonl

In case you were wondering...

My poor use of Golang defer woke me up. Repetitively.

I'm the sole dev-ops guy for two tech startups. I'm a 'pretty good' developer, not perfect, and I often follow best practice, not always.

This weekend, the servers won.

It all started with two lines of go code:

defer rows.Close()

and

defer db.Close()

I'd just deleted a library I created early in my go days which handled a database connection pool. I haven't checked if I was just not looking, or if it didn't exist back then, but either way, pooling is now handled by the inbuilt sql/database package. After doing some reading, it was clear that I should just open one connection and it would pool itself.

So I deleted my library, and updated the codebases for both of my startups to work without it. (Yes, in that order)

Instead of now taking and releasing connections from my pool, I now have - for one of the startups - a persistent database connection which is passed around to functions needing it, and for the other, a connection built per request (has to be, as it's multi-tenanted, with different mysql credentials per usergroup)

All testing passed, mainly because my tests are lightweight, so I deployed. Both.

A few hours later, my phone informed me, repeatedly, '1203: Too Many Connections'

This is in the 'one single connection' one, so... a little confused.

As it happens, the go code (either the database/sql package, or the mysql driver) opens up a connection whenever it needs one and the old ones are still in use. When it opens max_connections + 1 connections, it doesn't handle the mysql error by waiting, instead it throws the error. (Which is probably better than a deadlock...)

But why did I have so many connections? I must be leaving some rows open.

What I found:

for ... {
    rows, err := db.Query(`...`, ...)
    if err != nil{
        ...
    }
    defer rows.Close()
    ...
}

In my haste, I'd not noticed, I'm queueing up all of my rows close statements for the function end, which happens after the for loop, which opens way more than the allowed connection limit (about 100 in this case).

git checkout HEAD~1
go build ...

Oh no, I deleted the library. Why did I delete the library? What was I trying to prove?

Well, the only way is forward...

Fix code, push, test, take laptop with me wherever I go just in case.

Startup 2, 4am... 'null pointer reference'

Huh? That's new. I'll just check the log

So, heartbleed, that was a while ago. All of my servers are patched, and all keys revoked and renewed. Including, just for kicks, all ssh client keys.

Grab the iPad, log in to the server... key not authorised. Oh, that's right. Didn't get around to putting a new key on the iPad yet.

Grab the laptop, oh, right, that's in the car so I could quickfix the last bug.

Grab another laptop and my usb key - now we are talking. SSH into the web server, all good, ssh from there into the prod server... key not authorised. Oh, right, my ssh config has it set up to use the key on my laptop to do the tunnel thing. What's that called again... google, google, google, got it, fix the config, ssh into the prod server, sudo less /var/log/upstart/app.log, 99999... oh, this log ACTUALLY has 99999 lines. Waiting, Waiting (note to self: google the command to jump to the end, there must be one).

Stacktrace tells me that I'm using a closed database connection in my email function...

Look up the code, What I found:

// Request.go:

func HandleRequest(authBaton authBaton) error {

    db, err := core.OpenNewDatabaseConnection(authBaton)
    if err {
        return err
    }
    defer db.Close()  

    // [ Do request things ]

    go mailer.SendMail(db, templateName, data)

    // [ Do more request things ]
}

Yes, defer happens, most of the time, before the goroutine uses the now closed database connection.

So what's the lesson here?

I need, as I already knew I needed:

  • Better test coverage
  • A rollback strategy
  • Upstart respawn (just... not there, I didn't even add it...)
  • More time for manual tests
  • An opps team
  • An iPad rsa key

I probably don't have time for all of those (I don't have time to not... sure, got it, I'll get to that when I have time as well).

But hey, that's the nature of fake-it-till-you-make-it solo tech non funded startups... Isn't it?

Comments on Hacker News