vmime/tests/parser
Jan Engelhardt d296c2d1d5
vmime: prevent loss of a space during text::createFromString (#306)
```
mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate();
```

produces

```
=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com>
```

The first space between ``Test`` and ``München`` is encoded as an
underscore along with the first word: ``Test_``. The second space
between ``München`` and ``West`` is encoded with neither of the two
words and thus lost. Decoding the text results in ``Test
MünchenWest`` instead of ``Test München West``.

This is caused by how ``vmime::text::createFromString()`` handles
transitions between 7-bit and 8-bit words: If an 8-bit word follows a
7-bit word, a space is appended to the previous word. The opposite
case of a 7-bit word following an 8-bit word *misses* this behaviour.

When one fixes this problem, a follow-up issue appears:

``text::createFromString("a b\xFFc d")`` tokenizes the input into
``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This
"right-side alignment" nature of the whitespace is a problem for
word::generate():

As per RFC 2047, spaces between adjacent encoded words are just
separators but not meant to be displayed. A space between an encoded
word and a regular ASCII text is not just a separator but also meant
to be displayed.

When word::generate() outputs the b-word, it would have to strip one
space, but only when there is a transition from encoded-word to
unencoded word. word::generate() does not know whether d will be
encoded or unencoded.

The idea now is that we could change the tokenization of
``text::createFromString`` such that whitespace is at the *start* of
words rather than at the end. With that, word::generate() need not
know anything about the next word, but rather only the *previous*
one.

Thus, in this patch,

1. The tokenization of ``text::createFromString`` is changed to
   left-align spaces and the function is fixed to account for
   the missing space on transition.
2. ``word::generate`` learns how to steal a space character.
3. Testcases are adjusted to account for the shifted
   position of the space.

Fixes: #283, #284

Co-authored-by: Vincent Richard <vincent@vincent-richard.net>
2024-05-21 15:55:06 +02:00
..
attachmentHelperTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
bodyPartTest.cpp Skip delimiter lines that are not exactly equal to the boundary 2019-10-05 11:37:09 +02:00
bodyTest.cpp Avoid force-encoding display names that fit within qcontent 2020-12-11 23:10:39 +01:00
charsetFilteredOutputStreamTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
charsetTest.cpp vmime: avoid changing SEVEN_BIT when encoding::decideImpl sees U+007F (#303) 2024-05-21 15:45:29 +02:00
charsetTestSuites.hpp Code style and clarity. 2018-09-05 23:54:48 +02:00
datetimeTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
dispositionTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
emailAddressTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
emptyContentHandlerTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
fileContentHandlerTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
headerFieldTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
headerTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
htmlTextPartTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
mailboxGroupTest.cpp Fixed possible recursion crash when parsing mailbox groups. 2022-01-25 10:28:20 +01:00
mailboxListTest.cpp Fixed possible recursion crash when parsing mailbox groups. 2022-01-25 10:28:20 +01:00
mailboxTest.cpp vmime: prevent loss of a space during text::createFromString (#306) 2024-05-21 15:55:06 +02:00
mediaTypeTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
messageIdSequenceTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
messageIdTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
messageTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
parameterTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
pathTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
streamContentHandlerTest.cpp Code style and clarity. 2018-09-05 23:54:48 +02:00
stringContentHandlerTest.cpp Removed 'stringProxy' since COW std::string is no longer valid in C++11. 2018-09-15 07:41:26 +02:00
textTest.cpp vmime: prevent loss of a space during text::createFromString (#306) 2024-05-21 15:55:06 +02:00
wordEncoderTest.cpp Fixes/comments for guessBestEncoding (#304) 2024-05-21 15:47:05 +02:00