FontSite Archives

Main Page

Part I: Quotes and Dashes

The ability to design with type has come a long way in the past few years thanks primarily to all the wow-inducing page layout, word processing and illustration programs we use, but the next Big Thing — the web, the Internet, and especially our growing reliance on e-mail — is taking us backwards typographically.
The reason: information exchanges between different applications running on different systems are forced to rely on the lowest common denominator, in this case 7-bit ASCII text, in order to work. The term ASCII is an acronym for American Standard Code for Information Interchange, a standard that ensures the letter A, for instance, is the letter A no matter whether it’s on a Macintosh in San Diego, or a PC in Hamburg. We take this for granted, but the mechanics behind the process can lead to a good deal of confusion when setting type on computers.
So let’s start at the bottom and examine this lowest common denominator before discussing how to get above it.
The term
7-bit is tech-talk for 128. 7-bit means “two to the seventh power” (2 x 2 x 2 x 2 x 2 x 2 x 2), which equals 128. Now hold on to that thought for a moment. Computers store everything — words, pictures, dates — as numbers, or more precisely binary numbers, meaning the numbers are expressed using only the digits 1 and 0. If you look at a character chart, you’ll see that the uppercase letter A is located at position 65, but this is the decimal, or base 10, way of expressing the number. In binary format, using only 1s and 0s, your computer recognizes it as 1000001. Notice there are seven digits. The largest number that can be represented with seven digits in binary format happens to be 127, or 1111111, and since computer programmers prefer to count from zero not from one, we get 128. But binary numbers don’t make a hell of a lot of sense to us ten-fingered humans, so we tend to talk about them in ten-fingered terms, i.e., base ten, on things like character charts. The characters occupying positions zero through 127 on such charts comprise the lowest common denominator, and all computers and operating systems (well, the ones we’re concerned with anyway) are capable of exchanging text using them.
You’re quite familiar with these characters: they’re the ones you can see on your keyboard keys, the one’s we’re forced to use in e-mail, and for the most part, web pages. They also happen to be the same characters found on common typewriters. So it would seem that yes, Robin, the Mac
is a typewriter … along with everything else connected to the Internet.
So someone such as myself, someone who constantly harps on the use of tick marks instead of true quotes or apostrophes, or the use of hyphens instead of dashes, is forced to use tick marks, hyphens, and other lowly 7-bit characters whenever I compose e-mail messages.
Fortunately it’s only temporary. At some point those text files and e-mail messages sent over the Internet are caught and reformatted, elevated to higher typographic standards, before making it into print. I call this process document purification, the stripping and replacing of 7-bit ASCII punctuation and quotation marks with the proper 8-bit characters, as well as the purging of other typographical no-nos such as multiple spaces after periods, and the substitution of ellipsis, ligature, and fraction characters.
All of these characters reside in positions above 127, and are sometimes referred to as extended characters, or 8-bit ASCII. Two to the eighth power equals 256, so anything higher than 127 falls into this upper realm. A character such as the copyright © symbol, for instance, is identified by the decimal value of 169, a number requiring eight binary digits to represent it: 10101001.
Most extended characters are assigned different numbers by the Macintosh and Windows operating systems (thanks Apple, thanks Microsoft), and this is the reason we have to rely on the 7-bit characters for e-mail. They’re the only ones in common between the two systems (for the time being anyway, or until Unicode character encoding is universally supported by different operating systems and applications). An em dash, for example, is located at position 209 on the Mac, but 151 in Windows. And characters from the various European languages further complicate matters. Characters such as å, é, î, ø, ü and ÿ are located at positions 140, 142, 148, 191, 159 and 216 on the Macintosh, but 229, 233, 238, 248, 252 and 255 in Windows. Smart programs (most page layout, word processing and illustration programs) know how to figure out these differences. If you open a file created with the Macintosh version of QuarkXPress, the Windows version has no problem translating the em dash from the Mac 209 to its Windows seat of 151. But e-mail programs lack this intelligence, so an em dash will have to be expressed in an e-mail message by typing two or three hyphens. If this is text that will later be incorporated into printed documents, it will at some point have to be purified.

It sounds rather mediaeval, document purification, and for me it actually is a very ritualistic procedure I undertake in the early phases of designing a document, the systematic conversion of a lowly 7-bit text file into the more princely, in typographic terms, 8-bit text. I like to complete the conversion before placing the text into PageMaker or Quark (although both of these programs have the ability to convert some characters automatically). There are a number of ways of going about it, but I advocate the Rigorous Systematic Approach to the haphazard one. The latter consisting of simply scrolling through a document and changing any tick marks you spot to proper quotes, converting hyphens to dashes, removing extraneous spaces, inserting ligatures on the fly etc. This “Where’s Waldo” approach might work fine as a final copy edit, but is not very efficient as your primary method for riddance of these characters.
The next best approach to document purification is to use your application’s Find and Replace commands to search for all unwanted 7-bit keyboard characters and replace them with the appropriate 8-bit characters, as well as locating multiple spaces and replacing them with single spaces. Removing all double spaces from a text file is always the first task I undertake. I then move on to some of the more rarely used characters before tackling the biggies — apostrophes, quotes and dashes — but this is just a matter of preference (er, I mean ritual). Ellipses are a good example. I’ll instruct my text processor (I use a shareware program named Tex-Edit Plus for 90% of my word processing) to search for multiple periods and replace them with true ellipses, a relatively rare character. The true ellipsis character (created by pressing the Option and semi-colon keys on the Mac, or by typing Alt-0133 in Windows) is a single character made up of three dots, but many people create a faux ellipsis by typing three or sometimes four consecutive periods.
Because some people type four periods to create an ellipsis, you should deal with these before searching for any three-period ellipses. That way, if both varieties happen to be present in the document, you can replace the four-period ellipses first, then tackle the threes, but not the threes before the fours. If you searched and replaced the three-period ellipses first, you would replace any four-period ellipses with a true ellipsis followed by a single period, which is worse, considering that the ellipsis character quite often does not mix well with the normal period (depends on the typeface). The dot shape is the same, but the spacing between them is different. The same logic applies to searching for multiple spaces and replacing them with single spaces, or searching for hyphens to replace with dashes: to be on the safe side, search for three space (or hyphens) to replace with one before searching for two to replace with one. Any time you’re searching for multiple items, always begin by searching for the highest number and work down.
I then move on to quotes and dashes, which can be a bit trickier to replace manually because there are different types of quotes (opened and closed, double and single) and different types of dashes (em and en). You can’t, for instance, replace all straight quotes in a document, which are identical opened or closed, with true quotes, which have two distinct styles for opened and closed. Both PageMaker and QuarkXPress obviate this problem by giving you the ability to convert quotes and dashes when you import a text or word processor file. In both programs, the feature is called “Convert Quotes,” but it converts double hyphens to em-dashes as well. If you use one of these programs, you should turn on the Convert Quotes option whenever you import a text file. In PageMaker, you can turn this option on when you choose the Place command, and in Quark when you choose Get Text. It’s quick, easy, and complete, but there is an annoying limitation: you can only convert quotes at the time you import the file — you can’t automatically convert them after the file has been imported. You’ll have to search and replace them manually.
The Convert Quotes feature is different from the Smart Quotes feature, which converts straight quotes into curly quotes on the fly as you type, but does not change any existing straight quotes.
If you’re not working in an application capable of automatic quote and dash conversion, you’ll have to do it manually with search and replace commands. Start with the single straight quote, which is usually used as an apostrophe in 7-bit text. In British English, single quotes are often used where double quotes would be used in American English (and the other way around).
My main advice is, don’t find and replace globally, or all at once — replace them one at a time, starting from the beginning of the document and moving forward. Find a single tick mark; replace it with an apostrophe. Choose the Find Next command, then Replace Next, and so forth. Otherwise you run the risk (a guarantee actually) of having closed quotes inserted in place of open quotes, and vice versa.

 

 


Add/Strip

 

Download Add/Strip Here

The Best Way

My preferred method of document purification relies on a very useful shareware utility called Add/Strip. This simple yet remarkable program processes text files (it doesn’t work with other file types), removing extra spaces, converting 7-bit characters, even cleaning up text files created on PCs, which usually contain unnecessary control characters and carriage returns at the end of every line, making the task of manually removing them a tedious one. Add/Strip cleans them up in a snap (and so does my favorite text editor, Tex-Edit Plus for that matter). Neither program can substitute fraction characters from expert fonts, however, so you’ll still have to do that manually.
Add/Strip is available from several shareware sites on the web, including Cnet’s Shareware.com. The latest version is 3.4.1. The shareware fee is only $25 bucks.
Add/Strip is designed to automatically perform much of the work of cleaning up text files destined for import to page layout programs, as well as export from page layout programs to simple DOS-based or mainframe computers. All in a fraction of the time needed to manually reformat using Find and Replace commands.

A screen capture from Add/Strip

 


I’ve been searching for a Windows equivalent to Add/Strip, but have so far been unsuccessful. There are several search-and-replace utilities available, but none with any meaningful typographic controls (ligature substitution, straight-to-curly quote conversion, etc.). A new version of the excellent Super Note Tab has just recently been released, which is probably the best text editor available for Windows, approaching in many ways the awesome Tex-Edit Plus for the Mac, but Super Note Tab falls short when it comes to purifying text documents. If anyone out there is aware of such utilities for Windows, please let us know.

Next month I continue this article with a step-by-step approach to substituting ligature characters and a discussion of working with expert fonts.