PlainTalk is a simple network protocol syntax. Maybe you can pick it up by testing
the following example protocol interactively? There are already example messages
to the server (→
) and from the server (←
) there
as well as suggestions for what you could write in the input box:
PlainTalk is a message and field framing protocol that is human readable and writable and efficiently processable by computers. PlainTalk was designed by Rolf W. Rasmussen as a simplification and generalization of the IMAP protocol syntax, for use in real time graphics systems for broadcast TV.
When you are designing a network protocol, you can use PlainTalk as a layer on top of TCP to get to work with messages and fields instead of a stream of bytes. It fits in the TCP/IP stack as a sublayer of the application layer, using the byte stream interface of TCP below and providing a message stream interface to the rest of the application layer above.
The following is an example of what a network session using a PlainTalk-based protocol might look
like, with messages going both to the server (→
) and coming from the server
(←
). Each message is terminated by a newline, and fields are separated by a space.
0 protocol doubletalk 0 protocol doubletalk 1 define ignorance 1 ok {21}ignorance is strength
In the second message from the server, “ignorance is strength
”
is one field with embedded space characters. It is
possible to transmit control characters by using the sequence “{n}
”,
followed by n bytes to be read
verbatim, without interpretation. An escape sequence can appear anywhere within
a field, so the byte sequence “{21}ignorance is strength
” is exactly equivalent to both
“ignorance{1} is{1} strength
”
and “ignoranc{7}e is strength
”.
This makes it easy to implement streaming of arbitrary binary data.
Basically, that's it; newline to separate messages, space to separate fields and an escaping mechanism. If you would like to test the protocol interactively, use the simulated telnet session above. Otherwise, you can go on to read the definition in English or jump directly to the definition in Augmented Backus Naur Form (ABNF).
When you have a grasp of the basic syntax, continue on to the protocol design recommendations.
For a more terse and formal definition, see the definition in Augmented Backus Naur From below.
PlainTalk wants to separate an incoming stream of bytes into messages. Messages are just an ordered sequence of fields, and each field is simply a bunch of bytes.
So we start off expecting a message, which expects to start with a field, and a field expects to contain some number (possibly zero) of bytes. As we read bytes, we know that they belong to the current field we are reading, with the following exceptions:
If we see an ASCII space character (" "), this signifies the end of the current field, and the immediate start of a new field. In other words, space is a field separator. After reading the space, we continue reading bytes as before, but now we read into the new field.
The next field may have a payload of zero bytes, so the next incoming byte might be another space. Two space characters in a row signifies the end of two consecutive fields. In other words, white space does not get special treatment: a byte with the ASCII value 0x20 is a field separator. This is the only meaning of the space.
If we see an ASCII line feed (LF, "\n", 0x0a), this signifies the end of the current message, implying the end of the current field and the start of a new message. In other words, LF is a message terminator (or separator). After reading the LF, we continue reading bytes as before, but now we read into the new message's first field.
If we see an ASCII carriage return (CR, "\r", 0x0d), we interpret it as part of a CRLF sequence. The next byte in the protocol must be LF, which is handled as specified above. Anything else will cause the receiving end to terminate the connection.
We treat CR this way to be polite to users using the telnet program, which enforces network style line endings (which incidentally are the same as Windows style line endings).
If we see an ASCII opening curly bracket ("{"), we are looking at an escape sequence. The escape sequence consists of opening curly, a decimal number written in ASCII, a closing curly ("}") and then a sequence of escaped bytes. The number in the curlies specifies how many escaped bytes are following.
The bytes following the closing curly must be read verbatim. Inside the escape sequence, any byte, special or not, can appear without further escaping. This makes it easy to transport a binary stream.
As an example, consider the ASCII sequence “{5}O HAI
”.
Here we have an escape sequence with five bytes following the closing curly bracket.
This matches “O HAI
”. The space in this escape sequence
does not act as a field separator, since it is escaped.
Only decimal digits are allowed inside the curlies (“{1k}
”
is not allowed). Violations of this causes connection termination.
Leading zeroes are allowed, so “{005}O HAI
” is equivalent to “{5}O HAI
”.
Zero length escape sequences are allowed. It is good style to encode zero-length fields
as “{0}
”, for the convenience of human readers.
For simplicity of implementation, it is permitted to omit the number entirely. (“{}
” is OK)
This means the same thing as “{0}
”. To ease interpretation for humans,
it is preferable to encode zero-length escape sequences as “{0}
”.
There is no set upper bound on the size of any escape sequence. However, there is no reason to have excessively large escape sequences either. It is efficient enough to separate a huge binary stream into chunks of, say, 1 MB, and escape each chunk individually. Implementations should take care to avoid problems relating to integer overflows and buffer overflows with respect to the escape sequences. It is reasonable to terminate the connection on receiving escape sequences that are larger than what the receiver is prepared to handle.
Avoid generating escape sequences of 2GB or more, since this bumps into the upper limit of a 32 bit signed integer. Also be advised that if you are using PlainTalk for communicating with an embedded device, it might only allow escape sequences of up to for example 32k, so it can fit the number into a 16 bit signed integer.
After an escape sequence, we continue reading bytes into the current field. An escape sequence may appear directly after another escape sequence.
When terminating the connection due to error conditions (such as CR without subsequent LF, or problems in the escape sequences), it can be a good idea to issue an error message to the sender. Whether or not this is advisable or even possible is dependent on the domain specific protocol in use on the layer above PlainTalk.
If the end of the current message is reached before any bytes have been read into the first field of the message, the message is ignored. This allows a user to type in multiple consecutive line breaks without them being interpreted as messages containing a single empty field.
This fully defines the protocol.
For tips on how to design application level protocols using this syntax, read the protocol design recommendations.
For a less formal definition, see the definition in English above.
PlainTalk can be precisely defined in Augmented Backus Naur Form (RFC2234), with the exception of the <escape-sequence>, which requires interpretation of an integer represented in ASCII.
plaintalk = *( message message-terminator ) message-terminator = CRLF / LF message = "" ; Messages matching this production are to be ignored message =/ field *( field-separator field ) field-separator = SP field = *( safe-data / escape-sequence ) safe-data = *safe-byte ; A <safe-byte> is a binary octet that is not CR, LF, SP or "{": safe-byte = %x00-09 / %x0b-0c / %x0e-19 / %x21-7a / %x7c-ff ; n below is the number specified inside the "{", "}" pair, signifying ; that the "}" is followed by as many binary octets as specified. escape-sequence = "{" number "}" nOCTET number = *DIGIT
For tips on how to design application level protocols using this syntax, read the protocol design recommendations.