joepie91's Ramblings

home RSS

The Python documentation is bad, and you should feel bad.

19 Feb 2013

Python is quite often hailed as a language that is great to learn programming, due to its simple and often natural-language-like syntax. But there’s one big issue that many Python developers conveniently overlook: the documentation.

While PHP is no doubt a terribly inconsistent language that has some very bad language design, the documentation is, perhaps ironically, very good. It misses some information here and there, and even has some incorrect information in it, but overall it’s a very valuable learning resource - especially for people that are new to programming.

In this article, I’ll go into the most important differences between PHP and Python documentation, and how this is seriously affecting the adoption of Python.

Let’s start out with a simple example. Say you are a developer that just started using PHP, and you want to know how to get the current length of an array. You fire up a browser and Google for “PHP array length site:php.net”. The first result is spot-on, and one minute later, you know that count($arr) will suffice.

Now let’s say that you wish to do the same in Python. In this case, you would Google for “Python list length site:docs.python.org”, and the first result is... a page with several chapters on standard types? It’s entirely unclear how to get the length of a list, and you’ll have to scan through a giant amount of text - even ctrl+F will not help you much here - to figure out this very basic operation. Five to ten minutes later, you know that len(lst) is how to do it.

Note that I have added domain restrictions to both queries, to limit the search to the official documentation - after all, that's what I am discussing here.

This example immediately shows the first issue with the Python documentation: the organization and resulting Googleability. As opposed to the PHP documentation - which is nicely segmented into separate pages for each function and concept - the Python documentation is written like a book, with chapters and paragraphs. Ever tried to look up the exact point where something happened in a novel? It doesn't work.

When you Google for something, you will end up on a page that explains a lot of things, including what you’re looking for. But how are you supposed to know where on the page it is, or whether it’s even on the page at all? The problem here is that the particular operation you are trying to find documentation on, does not have its own page.

I frequently bring up the above example in conversations with others about the Python documentation. A common response to that is “but everyone knows how to get the length of a list, that doesn’t need its own page!” But consider this: if the documentation doesn’t explain how to use the language - especially the core components of it! - then what was the purpose of the documentation to start with? The whole point of the documentation is to explain things to people that do not understand them yet. Which brings us to the second issue...

The Python community

Update, February 19: Several people have pointed out that Python developers on Reddit and StackOverflow in particular do not really fit the following section, and I have to agree with that. This section refers primarily to communities that consist mostly of Python developers, and do not have a very distinct culture of their own. The two examples given by others - Reddit and StackOverflow - both have a very distinct and 'unusual' atmosphere in general, also outside the Python subreddit/category.

I will no doubt piss off quite a few people with this statement, but the community around Python is one of the most hostile and unhelpful communities around any programming-related topic that I have ever seen - and with that I am not just referring to #python on Freenode, but to communities with a dense population of Python developers in general. This point actually consists of several separate attitudes and issues.

The general norm for the Python community appears to be that if you are not already familiar with the language, you do not deserve help. If you do something in a less-than-optimal way, other Python developers will shout about how horrible you are without bothering to explain much about what you did wrong. When you ask out of curiosity how a certain thing works, and that thing is considered a bad practice, you will get flamed like there’s no tomorrow - even if you had no intention of ever implementing it.

Another issue are the very strong elements of fanboyism around various approaches and particular modules. Try asking any kind of question about sockets, and the standard response will be “use Twisted”. Try explaining why you do not wish to use Twisted, and no matter how valid the reason, the response will be along the lines of “use Twisted anyway”. And it’s not like these are some random people shouting stuff - no-one will even answer the question you asked in the first place, because everyone is too busy telling you to use Twisted.

But one issue in particular bothers me to no end: the assumption that source code is a reasonable replacement for documentation. The documentation on module X is bad? Just read the source. Want to know how the Python interpreter deals with input Y? Read the source. And so on, and so on.

Isn’t the purpose of tools to make your job easier, and less time-consuming? Isn’t the purpose of a higher level language to abstract away the lower level things you do not want to have to worry about? Then how is it acceptable to expect someone to invest large amounts of time into reading the source code of something to understand how to use it?

Would you expect a plumber to know the exact manufacturing process of his wrench?

But perhaps the biggest issue with the Python community is the ostrich mentality. All of the issues mentioned in this article so far - and all those that will be mentioned later on - are conveniently ignored, waved away, or justified by many Python developers whenever they are brought up. The ‘read the source’ mentality is, in fact, often a clear example of this ostrich mentality - instead of working on fixing the documentation, the lack of good documentation is justified by saying that “you can read the source anyway”.

Incomplete documentation

Let’s do an experiment. Think of a random function in a random standard library module in Python, visit its documentation entry, and try to find all the error conditions (return values, exceptions, when they happen, ...) without scrolling the page. Didn’t work? That’s not very surprising.

The Python documentation is incomplete. While not always incomplete in the sense of not carrying all the information, it’s very often incomplete in the sense of not carrying information in all the right places. When you go to look up any function in any Python module, standard library or not, you should have an immediate overview of the accepted arguments, the return values, the error conditions, and when these occur.

Again, PHP shines here, having all of the above in a standardized format, for nearly every single function in PHP. To figure out the error conditions for a Python function, you’ll first have to read the function description blurb, then scroll to the top of the chapter, the top of the page, the bottom of the chapter, and the bottom of the page - it may be in any of these places. If you’re very unlucky, the information is not on the page at all, and you have to either Google for it - or even try all permutations of input you can think of, in an interactive shell.

Error handling is important, that is pretty much universally accepted in the Python community. But if error handling is so important, why are you not giving people the tools and information to do so?

The documentation is unclear

Yet another problem. In many cases, the Python documentation is simply unclear. Natural language is ambiguous by nature - many sentences can be interpreted in more than one way. This is an absolutely deadly situation for documentation of a programming language, where you can blow up your entire project by doing one thing wrong.

PHP solves this by having examples for every single function and class. If you’re not sure what is meant with a certain sentence in the description, you just look at one of the included examples, and all ambiguity is removed. It’s immediately obvious how to use things.

In the Python documentation, examples are extremely scarce. If examples are given at all, they are often incomplete, unclear, or lack initialization code or context. More examples are necessary.

Why is all this important?

Now, by the time you’ve reached this point in the article, you may be thinking to yourself: “but I’m doing fine, I have no issues with the documentation as it is, you’re just whining!”

Think about this for a moment. If you are not only reading this article, but you are sufficiently pissed off by it to think something like this - does that not mean that you are already experienced enough not to need solid documentation? If you are an experienced developer, then you are most likely in a very bad position to judge how beginner-friendly the documentation for a language is.

On the one hand, the Python community is trying to actively ‘spread’ the word, and is telling people that Python is ‘so easy to learn’. On the other hand, both the documentation and the community are very newbie-hostile and unhelpful. Don’t you see a problem with this?

If you wish more people would use Python, then start making it possible for them to do so. Restructure the documentation. Think twice about how you respond to a newbie. Respect someones reasons for wanting to do things themselves - as long as you inform them of the associated risks or problems, in an informative manner. Turn Python from a language that pretends to be beginner-friendly, into an actually beginner-friendly language.

Most of all, accept that your personal experiences with Python, as an experienced developer, are not worth very much. Listen to the newbies when they tell you the documentation is hard to read or find stuff in.

No-one is fixing this

Some of these problems are very obvious. Others are less obvious, but still exist. With the amount of issues in the Python documentation and community, you’d expect at least some kind of effort to be underway to fix these... but as far as I can determine, there is not a single person or group of people working on fixing these problems. Why not?

This issue will not solve itself. The only way this issue can be solved, is through a cooperative effort of any Python developer that can spare a few minutes. If even half of the experienced Python developers in the general Python community rewrote the documentation for one function or concept, we would have an absolutely golden documentation in less than a month.

Make it happen. Recognize that it’s broken, and fix it. That’s what developers do.

tl;dr

As this is a tl;dr, it does not include in-depth elaboration and just serves as an overview of the points made in this article. If you'd like to respond to any of these points, please take the time to read the corresponding explanation above first, so you have a complete understanding of the point I am trying to make.

UPDATE: A reader requested that I provide an example of how the Python documentation could be improved. It took me a while to get around to it, but here goes - I've decided to rewrite the documentation for os.path.walk() as an example. The original documentation can be found here, my suggestion can be found here. My version is written using ZippyDoc, the source of the documentation page can be found here.