Wading into Go

Continuing my journey, now it's time to talk about all the places I fell down. Go is so much like other languages that it's easy to write working code. However, it is not other languages. It's Go. Like all other languages, it has its own idioms and best practices. If you don't learn them, you'll be writing working code that others will find difficult to read and collaborate on.

Step 1: Make it work.

I started by taking a Python script I'd written a few years ago and rewriting it in Go. This is because it was complicated enough that it would be a bit of a challenge, I knew what the output should be, and it was a good candidate for adding concurrency. The actual program is unimportant for this blog post. Essentially, it read in fixed-width files of several different formats, parsed them, and combined the data into a single CSV file.

So I wrote it. I got it to work. It was about three times faster than the Python version. Success! But then I started to wonder. When I review Python code, more often than not my suggestions have more to do with style than functionality. Maybe I could get someone more familiar with Go to have a look at my Python and Go programs and let me know where my Go was ugly.

Step 2: Make it work right.

So I went to the mailing list and asked for help. Two people volunteered to help. I sent the Go and Python versions of the code and waited to be enlightened.

The information I received was far beyond what I ever hoped for. The things I learned are going to be fuel for more blog posts after this one -- posts that go far beyond any particular language and which have already improved my skills as a developer after 15 years of experience.

Step 3: Learn!

Here I'm just going to mention some corrections I received that are easy to write up. The "big important topics" will be discussed individually in more depth in future posts.

Capitalization

A feature of Go I'm coming to love is the expressive quality of capitalizing things. Capitalize a function in your package and it will be exported to other packages importing your package. If your function name begins with a lower-case letter, it won't. The same goes for constants, fields in structs, and interfaces.

I knew that already from reading. What I didn't know is that there are other implications to this that should be considered. For example, if you are trying to read or write XML, the field names in your structs must begin with a capital letter or they will not be read/written.

Also, a lot of Go code uses capitalization. I took this to be a style thing, and capitalized struct names and function names in my own code. However, this is a message to other developers that these things should be considered exportable. For a stand-alone program, Random Capitalization is misleading at best and confusing at worst.

One case that clearly illustrates this: I had a struct with a capitalized name. However, all the fields in the struct were lower-cased. So I had an "exportable" struct which had no usable fields. Nor did it have any exported methods attached. That doesn't make any sense.

If you open it, close it.

Okay, so this should be a no-brainer. I know better. I could make excuses about how I've been spoiled by the with statement in Python, but the truth is I didn't close files I opened, and I never opened enough files for it to teach me a lesson. Thankfully, I was gently reminded.

Go has the wonderful defer statement. If you open something, close it with defer immediately (after being sure the open statement doesn't return an error). This puts your Close right next to the Open so you (and everyone else) will never have to guess.

Decomposition matters!

This is amazingly important, and will be discussed in more depth in another post. So I'll just give a brief overview here to get the point across.

For whatever reason (or combination of reasons), Python just "clicks" for me. I get it. It fits my brain. When I first came to Python, after maybe 10 years of writing (mostly bad) code, it felt immediately comfortable. Maybe it's me, but I think it's Python. To quote a developer who should need no introduction:

I noticed (allowing for pauses needed to look up new features in Programming Python) I was generating working code nearly as fast as I could type. When I realized this, I was quite startled.

Also:

When you're writing working code nearly as fast as you can type and your misstep rate is near zero, it generally means you've achieved mastery of the language. But that didn't make sense, because it was still day one and I was regularly pausing to look up new language and library features!

The point of all this is that I can always open up vim and just start typing, and come up with working code. I often do some reorganizing and refactoring as I go along, certainly. But the code that comes out the other end, in my opinion, is pretty good. I've come to realize that this is a terrible habit, and that I should always do a lot more thinking before opening an editor.

This is a huge topic and there's so much that can be said. In fact, there's already a lot that's been said far better than I could ever say it. So I'll just leave it with this for now:

Watch this talk!. It's not explicitly about decomposition, but it is the thing that made decomposition make sense to me, and I use a technique I learned from that video now to think about my task before I writing code. An excellent complement to this video is this PDF. Extra thanks go to Egon for expanding my mind with this and many other links to reading material that changed the way I write code.

The technique I developed from watching this video is that you should first think about what how many different kinds of gophers you need. That will hint at how many pieces your task should be broken into, and what those pieces are. If you just start writing code you might achieve the same functionality. But you will end up with something difficult to explain, understand, and maintain.

Use init() functions.

All Go programs must have a main() function. This is where execution begins. This is a familiar concept from C. However, you may also have zero or more init() functions. These can not be called explicitly. They are called (in no guaranteed order) before main() is executed. This is very handy. Here are two examples.

Have a global (within your package) variable. Have an init() function which reads the command-line arguments with the flag package and sets that variable. Now, your main() (or any other function) can just start doing its job without having to mess around with the plumbing.

The way database drivers work in Go is confusing at first. You import Go's database/sql package. You also import a database driver package (for PostgreSQL, for example). But you never use the driver package. In fact, you have to explicitly import it with no name, because otherwise it's a compile-time error. WTF? Well, the trick is that the database/sql package has an exported Register function which a database driver uses to register itself. The database driver package (from which nothing is ever called in your code) has an init() function which calls sql.Register, adding its capabilities into the standard library's database/sql package. This means that when you use database/sql, it's identical with any database driver you use, and you don't have to familiarize yourself with anything about different drivers. That is really cool.

Another thing about init() functions is that they're guaranteed to only be run once. So it's a natural place for doing things like reading command-line arguments.

Be aware of Unicode issues

This is also something everyone should know. In this case, my code was dealing with ASCII-only input in the fixed-width files. It was pointed out to me that I wasn't dealing with Unicode. That was true, but it was intentional in this case. But it's well worth mentioning that, although Go uses UTF-8 strings by default, be aware of when you're dealing with strings and when you're dealing with bytes, and when using strings in certain ways opens you up to unintentionally splitting up a multi-byte character.

Read up on packages!

This is important with any language. Understand the standard library. In one case, I was using the os module to open a directory and read all its contents. Then I had another function which, given a list of those contents, would return just the filenames matching a prefix. Congratulations! I just re-implemented the Glob function from file/filepath. This is certainly not a wheel I should have wasted my time reinventing.

I had it in my head that Go is a "low-level language." For some reason I assumed that this meant I'd be given the bare essentials and was expected to combine them to do even basic things. This is not true. The standard library is amazing.

Another important note is that Go programmers recommend reading the source of the standard library as a great way to learn to write idiomatic Go. It's well-documented and well-written. I have started this process, and in some cases I understand what Go does "under the hood" better than the Python equivalent.

Don't hide main()

If your package has a bunch of files in it, put main() in the file named main.go. Don't make people go hunting.

Worry about allocation, not declaration.

One of the nicest things about Go is the "shortcut" variable declaration. Instead of this:

var foo int
x = 42

you can do this:

foo := 42

This means that Go will automatically determine the type of foo from the integer 42. It's considered undesirable in Go to declare a bunch of variables up at the top. They should be declared close to (preferably when) they're used.

I was pre-declaring a variable because I was using it in a loop, and thought that using the := declaration in a loop would mean I was going to re-create the variable for every iteration. Apparently I was engaging in some false premature optimization.

Local constants

All the examples I've seen online and in books demonstrated constants by creating them before any functions. This makes sense. However, if only one function uses the constant, it's better to declare that constant inside the function.

Check your errors!

If the function you call returns an error, check it. It's obvious with things like opening a file. I was failing to do this when writing lines to my output file.

Worry about allocation.

Go is very easy to use. It insulates the programmer from a lot. Consider strings. Go has them. In C, you just have characters. You have to put them in an array to pretend you have a string. That means you have to know how big the array is before you write to it. That means no strings of arbitrary length. If you find yourself needing to write a string longer than your array, you have to stop and allocate new memory for your new, larger array.

However, in reality Go is still dealing with the same idea under the covers. When you create a map or a slice you can specify a size. You don't have to, but if you know you're going to be putting a lot of stuff in it, or fill it in a loop with an unknown number of iterations, it will help performance not to make the Go runtime allocate memory more often than it needs to.

Step 4: Profit!

Learn more Go and you'll write better Go. Write better Go and you'll be so happy you'll write more Go. Write more Go and you'll write better Go.

Profit!

Comments !

blogroll

social