The good old 1955s


We divide our time up into regular blocks. Seconds, minutes, hours, days, weeks, months, years, decades, centuries, millennia.

When does one block end and the next start?  With most of our divisions of time, the decision is arbitrary.  January 1 is arbitrary.  The start of a new decade, or a new century, is decided by the year we have arbitrarily decided to call Year 1 AD.

So much might be obvious.  The decade of the 1980s is only as valid as the decade 1984–1993.  The 20th century is only as extant as  the century running 1873–1972.

But we attach things to our arbitrary periods of time.  The 1940s was a decade of war; the 1970s was the decade of disco.  The fifteenth century was the last in the era of knights and maidens; the sixteenth was the first in the era of art and science.

All this raises an interesting question.  What would the decade 1975–84 look like?  Would this decade capture the disco of the 70s, or the hair of the 80s?  What would the century 1850–1950 look like?  Would this be the century of world war — in which case we have lived not 12, but 62 years since that violent century?

Let’s call that decade the 75s, and that century the 19.5th century.  Write in and tell me what these new unexplored decades would look like.

Is a crime an occupation?


There are kinds of activity for which we not only have names for the activity, but also for the doer of the activity.  Not only sculpting but sculptor; not only drinking but alcoholic; not only psychotherapy but psychotherapist.  Clearly, when we refer to a person in this way, the implication is that the activity is part of that person’s identity and that they do it frequently and regularly.

As it happens, we also do this for crimes.  We not only have names for crimes, but for “criminals.”  We not only have theft, we have thief.  We don’t only have murder, we have murderer.  Not only rape but rapist.  And notice that I placed “criminals” in quotation marks: we not only have crime, we have criminal.  As before, the implication is that the person commits crime frequently and regularly and that crime is an inseparable part of that person.

Is this correct?  For what number of those that we call “thieves” is thievery a part of their identity?  How many of those that we refer to as “murderers” commit murder often and regularly?  An extreme minority.  Referring to a person that committed murder as a “murderer” is as correct as referring to me as an “astronomer” because I once looked through a telescope.

Why then do we do we refer to people that have broken the law in these terms?  I believe the major reason is a desire to dehumanize in order to not understand: she did not commit rape because of a complex of causation leading to the event, or because of the situation in which she did it — rather, the act of rape revealed her true identity as a rapist.  The causal explanation of guilt, which is difficult to understand, is substituted by the guilt itself, which is not an explanation at all and so is easy to understand.

Boycotting for the masses: a web solution


Abstract/TL;DR

There is much that is wrong with the world and corporations are often to blame.  The most effective method of protest against the corporation is the boycott.  But boycotts are hard work requiring far too much time and effort.  Targeted boycotts are easier but leave the majority of guilty corporations unpunished.  The problem is that boycotts are uncoordinated and require instant access to information at every potential purchase.  I offer a potential web solution with four components: (1) a crowdsourced machine-readable database of objectionable things and their supporters; (2) a user area where users can register their participation in boycotts of those things; (3) user software to aid purchasing by providing instant information on whether a given product should be avoided; (4) a public website showing manufacturers’ estimated losses as a result of their actions.

Introduction: why boycott?

We live in a world of corrupt politicians and psychopathic corporations.  This much is entirely uncontroversial, and there’s no need for examples here.

Here’s the real problem: what the fuck can we do about it?

There are many tactics: letter-writing, indignation, using your vote, peaceful protest, violent protest, website banners, and so on.  They have their place but their effectiveness is limited for the simple reason that they don’t attack the enemy where it hurts.

We must understand that corruption is fundamentally driven by money and profit, not by ignorance, immorality, chutzpah, an illusion of public support, or anything else.  And so it must be here that we attack it.

There’s one simple tactic that does so: the boycott.

Boycotts are hard

In January 2012, Maddox posted an article about SOPA, the conclusion being that boycotting is the only way we’ll stop SOPA and everything like it that will follow.

What the article doesn’t address is the subsequent problem: boycotting is hard.  Why?

  • There are hundreds of organizations that officially support SOPA/PIPA/the next incarnation of the many bills designed to take away your freedom.
  • Of those, most will have many subsidiary companies or other connected organizations, meaning potentially thousands of brands one has to be aware of.
  • We buy things all the time and we don’t have the time for research into the political stance of our shampoo manufacturers (etc).
  • This is just one boycott!  The well-informed person may want to boycott many pieces of legislation, many corporations, many states, entire industries they disapprove of, and so on.

Here are some scenarios demonstrating the barrier:

  • You’re at your local shopping mall looking to buy X.  There are several shops that sell X.  But which of them support that new bill Y you hate so much?  No time to find out …
  • You’re at your local convenience store looking to buy X.  There are several different brands of X.  But which of them are associated with sweatshop work?  No time to find out …
  • You’re shopping for Xs in the vegetable aisle.  There are Xs imported from countries YZ, and Q.  But which of those have a horrible foreign policy?  No time to find out …
For me, these barriers are far too high: to research this effectively I would have to give up my job (which in turn would remove my income that gives me my power to boycott).

A non-solution: select, target, scapegoat

But we know that boycott can be effective at a critical mass of support and media coverage.  This is what happened with GoDaddy.

This is the basis of Maddox’s proposal: choose a small number of companies and hit them hard.  This targeted, scapegoating approach is based on the understanding that most people can’t be bothered to do the research.

It is more effective than the “learn this list of companies” approach.  The problem is that all those other companies get off free!

A potential solution: the web, crowdsourcing, and purchasing adviser software

The good news is that the number of people who would like to take part in boycotts is far larger than those that have the time and determination to do so.  There is latent energy to be unleashed.  Unleashing this energy must be done by lowering the barrier to entry.  In short, if I’m going to boycott, then the research must be done for me and be instantly accessible.

My proposed solution has four key parts:

  1. A publicly accessible, well-researched, up-to-date, independent, crowdsourced database of wrong-doings and their supporters.  The world already has a half-solution: Wikipedia, the success of which relies on user-generated content.  However, it is unstructured content designed for the casual reader, and the information is not targeted to boycotters.
  2. A users’ site where the boycotter can declare the causes that they support.  In conjunction with the above database the site can then produce a comprehensive list of corporations/states/etc they should boycott.
  3. Software to help the user assess individual purchases.  For example:
    1. A browser plugin.  This has the following components:
      1. Access to the user account on the above site.  It therefore knows the manufacturers (etc) the user wishes to avoid.
      2. Access to databases of individual products (such as the Household Products Database).  It therefore knows, given a product, whether the user should avoid it.
      3. Access to the user’s browsing and the ability to inject warnings.  For instance, when shopping on Amazon.com, the plugin highlights products to boycott.
      4. The opt-in ability to supply the user’s boycotting history to the users’ site.
    2. A barcode-scanning smartphone app.  Similar components to the above plugin, with the ability to identify a product from its barcode (like existing price comparison apps, e.g. Scandit for iPhone or Barcode Scanner for Android).
  4. An online summary of boycott effectiveness.  If companies don’t know why their sales are falling, they won’t change their stance!  Using data volunteered by users, we can publish (user-anonymized) estimates of how many dollars a manufacturer has lost due to their support of such-and-such.

Here’s a quick feasibility study:

  • The required technology exists and is mature.  User-produced, user-audited content is everywhere.  Independent product databases exist.  Barcode-scanning is reliable.
  • People are comfortable with the technology.  Users already guide their purchases with price comparison and user-review websites/apps.
  • Initial costs are for the software; volunteers and funding should be available.  The open-source voluntary model is successful.  Necessary funding could also be found on, say, Kickstarter.
  • Ongoing costs are mainly for servers, and other sites get by.  Non-profits like Wikipedia survive on voluntary contributions and this should be no different.

So, what do you say?  LET’S BOYCOTT!

The Thatcher effect in typography


Near my home, on my dog-walking route, there’s a small business called ‘FDM.’  Their initials are emblazoned in 2000pt capitals on the side of the property.  I don’t know what it stands for, or what they do.  The reason I bring them up is a one screamingly disrespectful disregard for typography, in what is an otherwise entirely sober Roman-esque sign.

I’m going to first show it to you upside down, in a fictional billboard advertisement sponsored by the lovely Jayma Mays:

Lovely — both of them sexy and sophisticated; both with subtle, clean curves that demand attention precisely due to their understatedness; both enticing you, by just giving a little away, to look further.

Right?

Yes, until the potential customer stops sitting in the driver’s seat upside-down, or reverts from their hand-walking on the pavement:

WTF?!  Or should I say, MTF?  The ‘M’ has suddenly broken in two and fallen in on itself.  Why was it considered a good idea to use an upside-down ‘W’ in place of the ‘M’ into which the artist had poured countless hours of labour in order to be completely unobtrusive?

Is it intended as ‘attention-grabbing’? It works; but not all publicity is good publicity — I’m going to have to find myself a new dog-walking route.

p.s.; I’m sorry, Jayma.  Every stroke of the GIMP brush was like a dagger in your baby-soft skin.  But it was in the name of Typography!

Redundant information in unordered lists: fundamental?


Let’s say you have an list of items, in some specific ordering: the list of your friends [james, tom, harry], say, in order of age.  The way I see it, there are two “types” of information here: the list items, and the list order.

Now, let’s say you want to make a list of specific “friendships”: [(eegg, james), (eegg, tom), (eegg, harry), (james, tom), (tom, harry)].  Now, there are several ‘orderings’ in this list that could be used to convey information: the order in which you list the friendships, and the order in which you list the two friends in each 2-tuple.

But what information can you convey using these orderings?  When specifying a friendship between two people, can you identify two “roles” played there?  Answer: no.

So you decide that you want to specify those friendships without putting any information in those orderings.  As you have many friends and space is at a premium, you also decide that you want to compress that data to “squeeze out” the wasted data that is taken up by the arbitrary order in which everything is specified.  Open question: how do you go about this?

Or to put the question a little more mathematically/computer-sciencey: what is the most spacially efficient way of serializing an arbitrary unordered set of items?

“Normally, I hand craft my images using vim.”


The above is a quote from Sam Ruby’s blog.  This deliciously innocent pedantry made me choke on my coffee in laughter.

The long road from HTML to PDF


HTML and PDF are the two most common formats on the web.  With good reason: HTML and friends give you a portability on the screen, while PDF gives you portability on paper.  With a few exceptions, they’re two ways of displaying the same content, optimized to different media.  So many developers will want to present their end-users with both ways of viewing their content: are you going to read this on screen, or on paper?  What’s more, developers will want the brand-image of their site also carried to the end media.

How do we achieve this?  The solution will inevitably involve conversion from a reference format, and I can see three possible architectures here:

  1. Develop in HTML; convert this to PDF when needed.
  2. Develop in PDF; convert this to HTML when needed.
  3. Develop in format X; convert this by turns to HTML and PDF when needed.

Let’s dismiss architecture no. 2 straight-away; no sane person develops in PDF.  Architecture no. 3 has some successful implementations (I’m thinking of reST here), but it has at least one problem: it makes difficult my second requirement of bringing the brand-image to both formats.  Why?  Because “brand image” will be specified somewhere external to the content itself, in a format specific to content format X, and this will then have to be transformed into CSS on the HTML side, and whatever else on the way to PDF (e.g. through LaTeX, of which I am ignorant).  This is presumably possible, but hard.  The only easy (and sensible) way architecture no. 3 can get around this is to go by the [Format X -> HTML -> PDF] route — but in this case we’re back to implementing architecture no. 1.

So, my iron logic dictates that the way of distributing content as both HTML and branding-aware PDF is to use an HTML -> PDF converter.


Now, searching for this will bring you examples of software making valiant attempts to implement all of HTML and CSS, including the CSS2 and 3 additions for paged media.  I know of three converters of this type:

  • Pisa, or XHTML2PDF.  Written in Python, under GPL or commercial license.
  • dompdf, “a (mostly) CSS 2.1-compliant HTML to PDF converter.”  Written in PHP, licensed under LGPL.
  • Prince XML.  Closed source and bloody expensive, though a free watermarking version is available.

I’m going to dismiss Pisa: my (quickly aborted) experience with it has been awful (every block element was placed in a visible box; loads of lines of CSS were mis-interpreted).  Of the dompdf library, I’ve actually used it to good effect when writing in PHP.  A couple of problems with it: I’m not a big fan of PHP (not well-suited to my ideal usage of a converter on the command line), and some things it couldn’t handle so well: in particular, pagination (with tables), the crucial ability of anything converting to a page-oriented format.

The third, Prince XML, is pretty good at what it does.  However, the free version I’ve tried out places its logo on the first page.  I don’t want this blemish on my lovely-crafted documents; nor do I want to waste my precious printer ink.  I considered hacking away the watermark: firstly inserting an extra first page via CSS, then slicing off the first page of the PDF, but this fudges things like page numbering; secondly by programmatically removing the watermark, but this came to nil too (I couldn’t figure out how to do it).  In any case, I don’t feel comfortable violating the license like that, and it’s an ugly solution.  Also, did I mention that Prince XML is bloody expensive?


It took me quite a while to realise what is now an obvious point: all these programs are reinventing the wheel.  They are literally implementing their own browser and its ability to print to PDF.  Most of you can do all that right now: File > Print > Print to File, depending on your browser.  The happy realisation is that the bulk of the work needed for our task is right here, hidden away in our open-source browsers.

But let me re-cap exactly what I want to be able to do:


eegg@pc:~ ls
foo.htm
eegg@pc:~ cat foo.htm | html2pdf > foo.pdf
eegg@pc:~ ls
foo.htm foo.pdf

This CLI program, html2pdf, is not so difficult to create, considering that it just has to harness the power of a browser engine.  Gecko, the Firefox engine, uses the cairo graphics library, which can produce PDFs.  It seems logical, therefore, that it could easily be harnessed (here’s one long request for that).  One attempt has been made at a plugin that allows you to order a PDF copy of the page when launching Firefox: cmdlnprint.  It works, with some fairly big shortcomings:

  • You have to have an installation of Firefox.
  • Despite being ordered from the command line, it launches a window before doing its thing.  When I first used this for a batch job, I had hundreds of windows open and my computer ground to a halt.
  • It’ll interact with your ordinary Firefox profile — if you set your paper size to A5 for a print job in your lunch hour, your later batch jobs will use the same setting.  Yes, you can create a separate profile for the conversion job, but my experience of this hasn’t been good.
  • Firefox doesn’t seem to have very good support for CSS when it comes to printing.  It doesn’t seem to play well with paper sizes, print margins, and other things.

So, what engines will work well?  Let’s have a look at some cross-browser tests for HTML5 and CSS3.  Flying ahead is Safari, the Mac browser.  Before you lament, “I’m not on a Mac!”, these results are those of WebKit, the engine behind Safari, and WebKit is open source and free.  So where is WebKit outside of Safari?  WebKit requires a widget toolkit in order to run, and the two toolkits in my world are Qt and GTK.  These both have projects at integrating WebKit; respectively, QtWebKit and WebKitGTK.


Herein lies the answer.  I stumbled across a project which seemingly has next to no publicity.  Without further ado, it’s wkhtmltopdf.  That’s WebKit HTML to PDF.  What’s more, if you’re on a Debian system, it’s in the repository.

How good is it?  I tested it against the HTML+CSS in an article a A List Apart taking about using of Prince XML.  Compared to the output that Prince XML created from it, it’s fairly impressive.