diff options
author | Jan Engelhardt <[email protected]> | 2024-05-21 13:55:06 +0000 |
---|---|---|
committer | GitHub <[email protected]> | 2024-05-21 13:55:06 +0000 |
commit | d296c2d1d590f8b4f619d9c555ff24ddecec1614 (patch) | |
tree | 49ab8eafbebcd761f9fd304dd888c47187b090c1 /tests/parser/mailboxTest.cpp | |
parent | tests: switch a byte sequence in textTest (#305) (diff) | |
download | vmime-d296c2d1d590f8b4f619d9c555ff24ddecec1614.tar.gz vmime-d296c2d1d590f8b4f619d9c555ff24ddecec1614.zip |
vmime: prevent loss of a space during text::createFromString (#306)
```
mailbox(text("Test München West", charsets::UTF_8), "[email protected]").generate();
```
produces
```
=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <[email protected]>
```
The first space between ``Test`` and ``München`` is encoded as an
underscore along with the first word: ``Test_``. The second space
between ``München`` and ``West`` is encoded with neither of the two
words and thus lost. Decoding the text results in ``Test
MünchenWest`` instead of ``Test München West``.
This is caused by how ``vmime::text::createFromString()`` handles
transitions between 7-bit and 8-bit words: If an 8-bit word follows a
7-bit word, a space is appended to the previous word. The opposite
case of a 7-bit word following an 8-bit word *misses* this behaviour.
When one fixes this problem, a follow-up issue appears:
``text::createFromString("a b\xFFc d")`` tokenizes the input into
``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This
"right-side alignment" nature of the whitespace is a problem for
word::generate():
As per RFC 2047, spaces between adjacent encoded words are just
separators but not meant to be displayed. A space between an encoded
word and a regular ASCII text is not just a separator but also meant
to be displayed.
When word::generate() outputs the b-word, it would have to strip one
space, but only when there is a transition from encoded-word to
unencoded word. word::generate() does not know whether d will be
encoded or unencoded.
The idea now is that we could change the tokenization of
``text::createFromString`` such that whitespace is at the *start* of
words rather than at the end. With that, word::generate() need not
know anything about the next word, but rather only the *previous*
one.
Thus, in this patch,
1. The tokenization of ``text::createFromString`` is changed to
left-align spaces and the function is fixed to account for
the missing space on transition.
2. ``word::generate`` learns how to steal a space character.
3. Testcases are adjusted to account for the shifted
position of the space.
Fixes: #283, #284
Co-authored-by: Vincent Richard <[email protected]>
Diffstat (limited to 'tests/parser/mailboxTest.cpp')
-rw-r--r-- | tests/parser/mailboxTest.cpp | 10 |
1 files changed, 10 insertions, 0 deletions
diff --git a/tests/parser/mailboxTest.cpp b/tests/parser/mailboxTest.cpp index 997a6a38..d1af23f2 100644 --- a/tests/parser/mailboxTest.cpp +++ b/tests/parser/mailboxTest.cpp @@ -32,6 +32,7 @@ VMIME_TEST_SUITE_BEGIN(mailboxTest) VMIME_TEST(testSeparatorInComment) VMIME_TEST(testMalformations) VMIME_TEST(testExcessiveQuoting) + VMIME_TEST(testSpacing) VMIME_TEST_LIST_END @@ -184,4 +185,13 @@ VMIME_TEST_SUITE_BEGIN(mailboxTest) VASSERT_EQ("generate", "=?utf-8?Q?Foo_B=40r?= <[email protected]>", a->generate()); } + void testSpacing() { + + vmime::text t("Foo B\xc3\xa4renstark Baz", vmime::charsets::UTF_8); + vmime::mailbox m(t, "[email protected]"); + VASSERT_EQ("1", "Foo =?utf-8?Q?B=C3=A4renstark?= Baz", t.generate()); + VASSERT_EQ("2", "=?us-ascii?Q?Foo?= =?utf-8?Q?_B=C3=A4renstark?= =?us-ascii?Q?_Baz?= <[email protected]>", m.generate()); + + } + VMIME_TEST_SUITE_END |