|

FontSite Archives
Main
Page
|
|

Part I: Quotes and Dashes
The ability to design with type has come a long way in the past few years thanks primarily to all the wow-inducing page layout, word processing and illustration programs we use, but the next Big Thing the web, the Internet, and especially our growing reliance on e-mail is taking us backwards typographically.
The reason: information exchanges between different applications running on different systems are forced to rely on the lowest common denominator, in this case 7-bit ASCII text, in order to work. The term ASCII is an acronym for American Standard Code for Information Interchange, a standard that ensures the letter A, for instance, is the letter A no matter whether its on a Macintosh in San Diego, or a PC in Hamburg. We take this for granted, but the mechanics behind the process can lead to a good deal of confusion when setting type on computers.
So lets start at the bottom and examine this lowest common denominator before discussing how to get above it.
The term 7-bit is tech-talk for 128. 7-bit means two to the seventh power (2 x 2 x 2 x 2 x 2 x 2 x 2), which equals 128. Now hold on to that thought for a moment. Computers store everything words, pictures, dates as numbers, or more precisely binary numbers, meaning the numbers are expressed using only the digits 1 and 0. If you look at a character chart, youll see that the uppercase letter A is located at position 65, but this is the decimal, or base 10, way of expressing the number. In binary format, using only 1s and 0s, your computer recognizes it as 1000001. Notice there are seven digits. The largest number that can be represented with seven digits in binary format happens to be 127, or 1111111, and since computer programmers prefer to count from zero not from one, we get 128. But binary numbers dont make a hell of a lot of sense to us ten-fingered humans, so we tend to talk about them in ten-fingered terms, i.e., base ten, on things like character charts. The characters occupying positions zero through 127 on such charts comprise the lowest common denominator, and all computers and operating systems (well, the ones were concerned with anyway) are capable of exchanging text using them.
Youre quite familiar with these characters: theyre the ones you can see on your keyboard keys, the ones were forced to use in e-mail, and for the most part, web pages. They also happen to be the same characters found on common typewriters. So it would seem that yes, Robin, the Mac is a typewriter
along with everything else connected to the Internet.
So someone such as myself, someone who constantly harps on the use of tick marks instead of true quotes or apostrophes, or the use of hyphens instead of dashes, is forced to use tick marks, hyphens, and other lowly 7-bit characters whenever I compose e-mail messages.
Fortunately its only temporary. At some point those text files and e-mail messages sent over the Internet are caught and reformatted, elevated to higher typographic standards, before making it into print. I call this process document purification, the stripping and replacing of 7-bit ASCII punctuation and quotation marks with the proper 8-bit characters, as well as the purging of other typographical no-nos such as multiple spaces after periods, and the substitution of ellipsis, ligature, and fraction characters.
All of these characters reside in positions above 127, and are sometimes referred to as extended characters, or 8-bit ASCII. Two to the eighth power equals 256, so anything higher than 127 falls into this upper realm. A character such as the copyright © symbol, for instance, is identified by the decimal value of 169, a number requiring eight binary digits to represent it: 10101001.
Most extended characters are assigned different numbers by the Macintosh and Windows operating systems (thanks Apple, thanks Microsoft), and this is the reason we have to rely on the 7-bit characters for e-mail. Theyre the only ones in common between the two systems (for the time being anyway, or until Unicode character encoding is universally supported by different operating systems and applications). An em dash, for example, is located at position 209 on the Mac, but 151 in Windows. And characters from the various European languages further complicate matters. Characters such as å, é, î, ø, ü and ÿ are located at positions 140, 142, 148, 191, 159 and 216 on the Macintosh, but 229, 233, 238, 248, 252 and 255 in Windows. Smart programs (most page layout, word processing and illustration programs) know how to figure out these differences. If you open a file created with the Macintosh version of QuarkXPress, the Windows version has no problem translating the em dash from the Mac 209 to its Windows seat of 151. But e-mail programs lack this intelligence, so an em dash will have to be expressed in an e-mail message by typing two or three hyphens. If this is text that will later be incorporated into printed documents, it will at some point have to be purified.

It sounds rather mediaeval, document purification, and for me it actually is a very ritualistic procedure I undertake in the early phases of designing a document, the systematic conversion of a lowly 7-bit text file into the more princely, in typographic terms, 8-bit text. I like to complete the conversion before placing the text into PageMaker or Quark (although both of these programs have the ability to convert some characters automatically). There are a number of ways of going about it, but I advocate the Rigorous Systematic Approach to the haphazard one. The latter consisting of simply scrolling through a document and changing any tick marks you spot to proper quotes, converting hyphens to dashes, removing extraneous spaces, inserting ligatures on the fly etc. This Wheres Waldo approach might work fine as a final copy edit, but is not very efficient as your primary method for riddance of these characters.
The next best approach to document purification is to use your applications Find and Replace commands to search for all unwanted 7-bit keyboard characters and replace them with the appropriate 8-bit characters, as well as locating multiple spaces and replacing them with single spaces. Removing all double spaces from a text file is always the first task I undertake. I then move on to some of the more rarely used characters before tackling the biggies apostrophes, quotes and dashes but this is just a matter of preference (er, I mean ritual). Ellipses are a good example. Ill instruct my text processor (I use a shareware program named Tex-Edit Plus for 90% of my word processing) to search for multiple periods and replace them with true ellipses, a relatively rare character. The true ellipsis character (created by pressing the Option and semi-colon keys on the Mac, or by typing Alt-0133 in Windows) is a single character made up of three dots, but many people create a faux ellipsis by typing three or sometimes four consecutive periods.
Because some people type four periods to create an ellipsis, you should deal with these before searching for any three-period ellipses. That way, if both varieties happen to be present in the document, you can replace the four-period ellipses first, then tackle the threes, but not the threes before the fours. If you searched and replaced the three-period ellipses first, you would replace any four-period ellipses with a true ellipsis followed by a single period, which is worse, considering that the ellipsis character quite often does not mix well with the normal period (depends on the typeface). The dot shape is the same, but the spacing between them is different. The same logic applies to searching for multiple spaces and replacing them with single spaces, or searching for hyphens to replace with dashes: to be on the safe side, search for three space (or hyphens) to replace with one before searching for two to replace with one. Any time youre searching for multiple items, always begin by searching for the highest number and work down.
I then move on to quotes and dashes, which can be a bit trickier to replace manually because there are different types of quotes (opened and closed, double and single) and different types of dashes (em and en). You cant, for instance, replace all straight quotes in a document, which are identical opened or closed, with true quotes, which have two distinct styles for opened and closed. Both PageMaker and QuarkXPress obviate this problem by giving you the ability to convert quotes and dashes when you import a text or word processor file. In both programs, the feature is called Convert Quotes, but it converts double hyphens to em-dashes as well. If you use one of these programs, you should turn on the Convert Quotes option whenever you import a text file. In PageMaker, you can turn this option on when you choose the Place command, and in Quark when you choose Get Text. Its quick, easy, and complete, but there is an annoying limitation: you can only convert quotes at the time you import the file you cant automatically convert them after the file has been imported. Youll have to search and replace them manually.
The Convert Quotes feature is different from the Smart Quotes feature, which converts straight quotes into curly quotes on the fly as you type, but does not change any existing straight quotes.
If youre not working in an application capable of automatic quote and dash conversion, youll have to do it manually with search and replace commands. Start with the single straight quote, which is usually used as an apostrophe in 7-bit text. In British English, single quotes are often used where double quotes would be used in American English (and the other way around).
My main advice is, dont find and replace globally, or all at once replace them one at a time, starting from the beginning of the document and moving forward. Find a single tick mark; replace it with an apostrophe. Choose the Find Next command, then Replace Next, and so forth. Otherwise you run the risk (a guarantee actually) of having closed quotes inserted in place of open quotes, and vice versa.
|