Female Wizards on the Discworld

I have a theory about female wizards on the Discworld, which is just about possible to defend on the basis of the first three books (The Colour of Magic, The Light Fantastic, and Equal Rites), but which also owes something to much later ones (The Last Continent and Unseen Academicals).

Unseen University is the Discworld’s premier college of magic (in its own mind, at least); not the only one. We already know there are wizards on Krull, which is almost completely separate from the rest of the Disc. We know that at least one of them (Marchessa) was female, and that this was not remarked on.

Buggarup University in XXX is another example of a separate college of magic, though it too is all-male.

In Unseen Academicals, we see that in fact Unseen University has rivals not only in places separated from the rest of the Disc, but also on the main supercontinent. The UU Dean has just set one up in Quirm; there’s a visiting professor from one in Genua; and I recall hints that there are others, with long-standing rivalries between them. (And Ponder Stibbons can play academic politics as well as anyone, thank you very much!)

We also see that there are differences in academic culture. UU wizards must be celibate (there’s a hint in The Colour of Magic (and again in Sourcery) that it’s only straight sex which is outlawed, but the possibilities thereby left open are never explored. This does not appear to be the case in Genua. When Ridcully heard that a Genuan wizard had been “named in divorce proceedings”, he just assumed that the Genuans didn’t prohibit their wizards having sex with women. He had to be explicitly told that in this case it was gay sex.

So UU has a “straight sex explicitly outlawed” and “gay sex doesn’t happen, does it?” culture, but accepts that other universities have different cultures. The one in Krull even accepts women, and did even before Esk’s time. Maybe others do too? Maybe, in fact, UU is the sole remaining holdout?


Why did Thorongil warn Ecthelion against the White Wizard?

I’m rereading The Lord of the Rings. Appendix A, “Annals of the Kings and Rulers”, tells us that Aragorn son of Arathorn spent part of his youth in Minas Tirith under the assumed name “Thorongil” serving under Ecthelion, Steward of Gondor.

Thorongil often warned Ecthelion not to put trust in Saruman the White in Isengard, but to welcome rather Gandalf the Grey.

Appendix B, “The Tale of Years”, tells us,

2957-80 Aragorn undertakes his great journeys and errantries. As Thorongil he serves in disguise both Thengel of Rohan and Ecthelion II of Gondor.
10th July 3018 Gandalf imprisoned in Orthanc.
18th September 3018 Gandalf escapes from Orthanc in the early hours.
25th October 3018 Council of Elrond.

Saruman’s treachery was not clear to anyone before the dispute with Gandalf in July 3018. And Aragorn did not learn of it till he and Gandalf met again in Rivendell in October. So why was Aragorn already suspicious of Saruman roughly 40 years earlier?

I asked this question a while ago on Science Fiction & Fantasy Stack Exchange. I got a few good answers. Here’s the one I selected as the best, from Peter Turner:

Not sure where this is in the annals, but it says in the Tolkien Companion by J.E.A Tyler

Saruman made his first deliberate move in this direction (toward imposing his will, which was forbidden of the Istari) in the year 2759 Third Age, when he appeared at the Coronation of King Frealaf of Rohan, successor of the mighty Helm Hammerhand. The Wizard brough with him rich presents, and declared himself the friend of Rohan and gondor, and a little later was able to persuade Steward Beren of Gondor to grant him the Keys of Orthanc, the mighty Tower which, together with its fortress of Isengard, commanded the strategic Gap of Rohan. All thought this was a welcome move.

All, that is, except a weary ranger who would see everything given up by Gondor as a challenge to its power.

And it further says that

all the time the Wizard was secretly searching the Tower of Orthanc for a long-lost treasure of the Dunedain … the Palantír of Orthanc.

Then in 2851 the White Council met to think of ways to stop Sauron from coming back

Saruman, hoping that the Ring would expose its location if Sauron were left unharassed, deliberately overruled a strong recommendation (from Gandalf) … that Dol Guldur be attacked.

By his actions, Gandalf may have suspected that Saruman was up to something, although I don’t think Gandalf even knew of the ring.

So, either through his own understanding of the Palantír through the lore of his people or through his association with Gandalf, Aragorn was more naturally suspicious than Gandalf and I think it makes sense that he’d know something was amiss well before anyone else had reason to suspect.

You can read Peter’s answer and all the others at SF&F SE. This entire blog post, both my own writing and the section I quoted from Peter, is under the license CC BY-SA 3.0. Feel free to repost elsewhere.


Tu Chu asked a question on Stack Overflow:

I am developing a game for the iOS (and later for Android) devices which needs to get data from a database on a server. What I have done so far is to use PHP to echo out the data from the database as XML. The program will check often with the server so performance is a big deal here. So, would JSON or XML be better for this task?

Well, which is better? I don’t know. It depends on the specific use case, and we don’t have enough detail to answer that question. And this, indeed, is what I said:

Produce XML output. Check the time taken and the file size.

Produce JSON output. Check the time taken and the file size.

Decide which is best.

What more could I say?

When Alice asked a more general question on Programmers Stack Exchange, I was able to say more.

Alice asked,

How important is it to learn XML when JSON is able to do almost all that I need? Having said that, I use JSON mainly for AJAX requests and obtaining data from various APIs. I am a total newbie to web development and the reason I am asking this is that I want to know whether I should go ahead and buy a book on XML or whether I can just give it a pass.

Well, while XML and JSON do have overlaps in use-cases, they are actually very different languages with very different design goals, so I replied,

XML definitely outshines JSON for markup (which is, after all, hinted at in the name).

I wouldn’t like to see a random XHTML page converted into JSON format. It would be horrible. OpenOffice and the latest editions of Microsoft Office all use compressed XML as their format of choice.

As a general rule: Markup goes in XML; structured data goes in JSON.

That’s when you’re outputting data and have full control yourself over the format. If you’re outputting data according to industry standards, or consuming other people’s data, you may need to use XML even in places where JSON would seem more appropriate. That’s because XML is longer established and has been used in many standards.

License: CC BY-SA 3.0. Feel free to repost elsewhere.

How can I send 100,000 e-mails weekly?


That really is the simplest answer.

On Stack Overflow, xRobot asked for guidance on setting up a system which would send 100,000 e-mails every week to a variety of addresses. This is, actually, quite tricky, as was demonstrated in Piskvor‘s rather awesome answer. Here it is:

Short answer: While it’s technically possible to send 100k e-mails each week yourself, the simplest, easiest and cheapest solution is to outsource this to one of the companies that specialize in it (I did say “cheapest”: there’s no limit to the amount of development time (and therefore money) that you can sink into this when trying to DIY).

Long answer: If you decide that you absolutely want to do this yourself, prepare for a world of hurt (after all, this is e-mail/e-fail we’re talking about). You’ll need:

  • e-mail content that is not spam (otherwise you’ll run into additional major roadblocks on every step, even legal repercussions);
  • in addition, your content should be easy to distinguish from spam — that may be a bit hard to do in some cases (I heard that a certain pharmaceutical company had to all but abandon e-mail, as their brand names are quite common in spam mailings);
  • a configurable SMTP server of your own — one which won’t buckle when you dump 100k e-mails onto it (your ISP’s upstream server won’t be sufficient here and you’ll make the ISP violently unhappy; we used two dedicated boxes);
  • some mail wrapper (e.g. PhpMailer if PHP’s your poison of choice; using PHP’s mail() is horrible enough by itself);
  • your own sender function to run in a loop, create the mails and pass them to the wrapper (note that you may run into PHP’s memory limits if your app has a memory leak; you may need to recycle the sending process periodically, or even better, decouple the “creating e-mails” and “sending e-mails” altogether).

Surprisingly, that was the easy part. The hard part is actually sending it:

  • Some servers will ban you when you send too many mails close together, so you need to shuffle and watch your queue (e.g. send one mail to joe@example.com, then three to other domains, only then another to other_address@example.com).
  • You need to have correct PTR, SPF, DKIM records.
  • You need to handle remote server timeouts, misconfigured DNS records, and other network pleasantries.
  • You need to handle invalid e-mails (and no, regex is the wrong tool for that).
  • You need to handle unsubscriptions (many legitimate newsletters have been reclassified as spam due to many frustrated users who couldn’t unsubscribe in one step and instead chose to “mark as spam” — the spam filters do learn, especially with large e-mail providers).
  • You need to handle bounces and rejects (“no such mailbox ojhn@example.com”; “mailbox john@example.com full”).
  • You need to handle blacklisting and removal from blacklists. (Sure, you’re not sending spam. Some recipients won’t be so sure — with such large list, it will happen sometimes, no matter what precautions you take. Some people (e.g., your not-so-scrupulous competitors) might even go as far to falsely report your mailings as spam — it does happen. On average, it takes weeks to get yourself removed from a blacklist.)

And to top it off, you’ll have to manage the legal part of it (various federal, state, and local laws; and even different tangles of laws once you send outside the U.S. (note: you have no way of finding out whether joe@example.com lives in Southwest Elbonia, the country with world’s most draconian antispam laws)).

I’m pretty sure I missed a few heads of this hydra — are you still sure you want to do this yourself? If so, there’ll be another wave, this time merely the annoying problems inherent in sending an e-mail. (You see, SMTP is a store-and-forward protocol, which means that your e-mail will be shuffled across many SMTP servers around the Internet, in the hope that the next one is a bit closer to the final recipient. Basically, the e-mail is sent to an SMTP server, which puts it into its forward queue; when time comes, it will forward it further to a different SMTP server, until it reaches the SMTP server for the given domain. This forward could happen immediately, or in a few minutes, or hours, or days, or never.) Thus, you’ll see the following issues — most of which could happen en route as well as at the destination:

  • The remote SMTP servers don’t want to talk to your SMTP server.
  • Your mails are getting marked as spam (<blink> is not your friend here, nor is <font color=...>).
  • Your mails are delivered days, even weeks late (contrary to popular opinion, SMTP is designed to make a best effort to deliver the message sometime in the future — not to deliver it now).
  • Your mails are not delivered at all (already sent from e-mail server on hop #4, not sent yet from server on hop #5, the server that currently holds the message crashes, data is lost).
  • Your mails are mangled by some poorly designed server en route (this one is somewhat solvable with base64 encoding, but then the size goes up and the e-mail looks more suspicious).
  • Your mails are delivered and the recipients seem not to want them (“I’m sure I didn’t sign up for this, I remember exactly what I did a year ago” (of course you do, sir)).
  • There are problems with users with various versions of Microsoft Outlook and its unique handling of Internet mail.
  • You hit wizard’s apprentice mode (a self-reinforcing positive feedback loop — in other words, automated e-mails as replies to automated e-mails as replies to…; you really don’t want to be the one to set this off, as you’d anger half the internet at yourself).

And it’ll be your job to troubleshoot and solve this (hint: you can’t, mostly). The people who run a legit mass-mailing businesses know that in the end you can’t solve it, and that they can’t solve it either — and they have the reasons well researched, documented and outlined (maybe even as a PowerPoint presentation — complete with sounds and cool transitions — that your bosses can understand), as they’ve had to explain this a million times before. Plus, for the problems that are actually solvable, they know very well how to solve them.

If, after all this, you are not discouraged and still want to do this, go right ahead: it’s even possible that you’ll find a better way to do this. Just know that the road ahead won’t be easy — sending e-mail is trivial, getting it delivered is hard.

I’ve rewritten that slightly to tweak the grammar and to avoid a couple of unnecessary and potentially triggering metaphors. As good as it is, it’s not the last word on the subject. Here’s more advice, from splattne, on how not to be marked as a spammer:

Be sure that your e-mails don’t look like typical spam e-mails: don’t insert only a large image; check that the character-set is set correctly; don’t insert “IP-address only” links. Write your communication as you would write a normal e-mail. Make it really easy to unsubscribe or opt-out. Otherwise, your users will unsubscribe by pressing the “spam” button, and that will affect your reputation.

On the technical side: if you can choose your SMTP server, be sure it is a “clean” SMTP server. IP addresses of spamming SMTP servers are often blacklisted by other providers. If you don’t know your SMTP servers in advance, it’s a good practice to provide configuration options in your application for controlling batch sizes and delay between batches. Some mail servers don’t accept large sending batches or continuous activity.

Use e-mail authentication methods, such as SPF, and Domain Keys to prove that your emails and your domain name belong together. The nice side-effect is you help in preventing that your email domain is spoofed. Also check your reverse DNS to make sure the IP address of your mail server points to the domain name that you use for sending mail.

Make sure that the reply-to address of your emails are a valid, existing addresses. Use the full, real name of the addressee in the To field, not just the email-address (e.g. "John Doe" <john.doe@example.com> ) and monitor your abuse accounts, such as abuse@example.com and postmaster@example.com.

Of course, on the other end, it’s also important to protect against spam coming in.

The copyright for the two essays quoted above rests with their original authors. They were originally published on Stack Overflow and Super User, respectively. Both of those essays, and this entire blog post, are under the license CC BY-SA 3.0. Feel free to repost elsewhere.

Tentative thoughts on Payment Gateways

If you’re selling stuff online, you need a “payment gateway”. That’s something that sits between your website and the bank, so that you can accept money over the Internet. PayPal is the biggest and best known of these. As far as I can tell, different payment gateways work in three different ways. (There may be more, but these are the three I’m aware of. I’m also including one other way of taking money, which is a gateway for payments, but isn’t a “payment gateway” according to the standard definition.) I don’t know of any standard terminology to distinguish these methods, so I’ve invented my own.

So, here’s my breakdown of five types of checkout. (Yes, five.)

1. Orders only

There are, I said, four types of payment gateway (or, three plus a bonus fourth), but five types of checkout. That’s because it’s possible to have something like an online shop where no money actually changes hands. I’ve built a site like this. It displays products, and has nice little “add to cart” buttons, so you can build up an order. You can then review your order and submit it. Then you get an order ID. The owner of the site will then contact you separately to arrange payment and delivery. This method works fine for the site in question (which is for a trade-only wholesale merchant in the fashion business).

This sort of checkout process is by far the easiest to build, as there is no need to interact with* any other system. It’s entirely self-contained.

2. Internal checkout

I call this one “internal” because all the work goes on behind the scenes. You have a form on your website into which the user enters their credit card details. (Watch this: you’re receiving credit card information, so now you’re under a legal obligation to deal with it carefully.) But we don’t store this information, instead, our webserver submits that credit card data to the payment gateway. The response from the gateway indicates whether or not the transaction was successful. (Submitting the information and receiving the response can be a single operation. The exact workflow will depend on the payment gateway concerned. The only one I’m familiar with, Realex Payments, receives the credit card information as a POST request in XML format, and returns the response (also in XML format) immediately.)

Remember, talking to the payment gateway is all happening behind the scenes. The customer has no idea how it works, or which payment gateway you’re using. They enter their credit card details into your website, press submit, and get a response of “payment made” or “declined” or whatever from your site. They have no need or reason to know or care which payment gateway you’re using.

An advantage of the internal checkout method is that you are entirely in control of the user experience. The customer never leaves your site; never sees any logos or branding other than yours. And if you think the checkout process is clunky and difficult to use, you can change it.

Realex Payments, based in Dublin, is one of the largest such operators in the European market. PayPal also supplies this type of payment gateway, but only in the UK, the USA, and Canada.

3. External checkout

For the external checkout, the customer is sent away to an external site to complete the payment process (hence, as you may have guessed, the name). So when the customer has added a few items to their cart, they can click on a button labelled “Pay with X”, proceed to that other site, and pay there. There are a few advantages to this. For a start, customers may be more likely to trust a big site like PayPal or Google with their credit card data. Also, they may already have an account there, and so be able to pay without having to type out their credit card information at all. And, from your perspective, you’re free from having to worry about credit card security: you never see any credit card information at all.

Another thing about the external checkout is that you aren’t limited to just the one of them. There’s nothing to stop you giving the customer options: you can put “Pay with PayPal” and a “Check out with Google Checkout” buttons on the same page, and give the customer a choice. (You can also give the customer the choice of using the internal checkout, of course.)

The bad thing about the external checkout is that they’re a lot trickier to code. The workflow is far more complex. For a start, you have to send the customer away to another site, but with information. There’s no point just sending them away to PayPal. They have to go to PayPal with the information that they’ve come from you, and they’re buying such-and-such which costs so-and-so. The way to do that is to build a form on your site which contains all the necessary information in hidden fields, then use javascript (with a fall back to an image submit button) to send them away to the payment gateway (this is a POST request). (That’s how I did it for PayPal, anyway (and PayPal is the only such external checkout I’ve so far built).)

A while later, the customer will probably arrive back on your site with a similar POST request containing information that they’ve paid. This is all well and good, but how do you know it hasn’t been faked? Also, what if they never do come back to your site? What if they go to PayPal, pay, and then continue on their merry way without visiting your site again? Well, we don’t rely on that.

In our initial submission to PayPal, we send full data about all the products being purchased: ID, title, price, tax, shipping costs, and suchlike, but we also send them a URL and a unique tracking code. When the payment is made, PayPal posts all this data back to that URL (we call it the listener URL, because it sits and listens for PayPal to call it). That’s when we know the payment has gone through. Oh, wait, no we don’t. That could have been faked too. Remember, we aren’t contacting PayPal directly. We haven’t posted any information to PayPal’s servers. We’ve given information (including the unique tracking code and our listener URL) to the customer, and asked them to send it to PayPal. They could be tricking us. The “payment made” response to our listener URL could be a fake too.

So this is the point where we do contact PayPal directly. As soon as we receive the “payment made” call to our listener URL, we send all the information back to PayPal, basically asking “Hey, did this actually come from you?”. PayPal responds immediately, either confirming or denying the call. If PayPal confirms it, we mark the payment as made.

Hang on a second. Someone ordered on our site several thousand euros worth of goods. We created an order, gave it a unique tracking code, and sent them off to PayPal. Then we got word from PayPal that a payment had been made for an order with that tracking code. But the payment was only three euro. What now? And that’s why our listener doesn’t actually mark a payment as made as soon as PayPal confirms it. Instead, it reads all the data from the PayPal response, which includes full details of every item bought, its cost, and all related handling and shipping charges. We then verify that this all matches up with the order we have on record for that unique tracking code. And only then do we mark the payment as made. And we store the order with the PayPal transaction ID (this ID is generated by PayPal, and is unrelated to the code we generated and have used to track the order so far).

When the customer returns to our site, they do so with a POST request from PayPal which includes the transaction ID. This request cannot have been faked, because a faker would have no way of knowing that transaction ID, which was generated by PayPal. So we can be happy that the person landing on our site now is the person who just made the purchase, and we can show a receipt. Everything is hunky-dory.

Except … wait for it. Sometimes, PayPal will return a customer to us before sending the order confirmation to our listener URL. So if a customer lands on our site with a transaction ID we don’t recognize, we can’t simply assume it’s an error. It might be an order which hasn’t come through yet. So we show the page and wait on it a while for an order to come through, checking occasionally (using Ajax) whether an order with that transaction ID has been processed. If it has, we redirect the customer to the receipt for that order. Failing that, we eventually give up.

Just to be slightly more awkward, in PayPal’s sandbox (test environment), there’s no way to force delay the call to the listener URL so you can test this workflow. You just have to code and hope it works. Read that sentence again: Sometimes, PayPal will return a customer to us before sending the order confirmation to our listener URL. Sometimes. There’s no way to force that situation so you can test it.

Also, in case you thought that was too easy, PayPal provides a bunch of other services. It can supply shops which are managed entirely by PayPal: you log into PayPal and create your products and set their prices, and PayPal will give you little HTML snippets to put into your site. And so you can have a shop with no server-side coding on your part at all. This is all very well, but PayPal puts all the documentation for these two completely different situations together into one massive, badly written, repetitive PDF document, and expects you to read it. PayPal’s documentation is easily the worst I’ve ever seen anywhere. It is eye-bleedingly awful. (Realex Payments, by contrast, has very well-written documents. They are to the point, self-contained, and clear.)

The only external checkout I’ve built worked with PayPal. Google Checkout is another provider in this area, and Realex Payments also provide an external checkout service. And, so, no doubt, do many others. As I said earlier, there’s nothing stopping you giving your customer the choice of all these and more.

4. Redirect checkout

The “redirect checkout” is basically an external checkout which pretends to be an internal checkout. The customer enters their credit card details into a form on our site, but the form submits to an external site, which then redirects straight back. The customer, unless they are paying close attention to their browser, does not even realize that they’ve left the site they were on. From a coding perspective, this is essentially the same as an external checkout, but from a user’s perspective, it’s the same as an internal one.

I’ve never actually built a site which used a redirect checkout, and couldn’t name a provider. I’m not making it up, though. It’s something I’m sure I’ve read about somewhere.

5. Telephone payments

Bonus method!

This is the one which doesn’t actually count as a “payment gateway”. The official definition of “payment gateway” is all about credit cards and suchlike, and this method does not require credit cards.

If every product on your site is the same price, and you’re fairly confident that most people will be buying only one product at a time (we’ve done this, for a site which accepted classified advertising), you can accept payment by premium telephone number. First, get a phone number which charges people a fixed price per call, rather than by the second. Then, set up your checkout to create a unique random four-digit number for each order. Then, set up Asterisk to answer the phone, accept input of the four-digit code, and send a signal to to site that the order has been paid.

I did work on a site like this, but I had nothing to do with the Asterisk end of things. It’s a clever program for answering phones and managing phone menu options and phone trees.

* I did not say interface with. You can comment below to thank me for this, if you like.