Commit Graph

125 Commits

Author SHA1 Message Date
Jan Engelhardt
aa60d00496
Fix a test failure in testNewFromString (#311)
Fixes an oversight in d296c2d1.
2024-06-11 20:47:53 +02:00
Jan Engelhardt
d296c2d1d5
vmime: prevent loss of a space during text::createFromString (#306)
```
mailbox(text("Test München West", charsets::UTF_8), "a@b.de").generate();
```

produces

```
=?us-ascii?Q?Test_?= =?utf-8?Q?M=C3=BCnchen?= =?us-ascii?Q?West?= <test@example.com>
```

The first space between ``Test`` and ``München`` is encoded as an
underscore along with the first word: ``Test_``. The second space
between ``München`` and ``West`` is encoded with neither of the two
words and thus lost. Decoding the text results in ``Test
MünchenWest`` instead of ``Test München West``.

This is caused by how ``vmime::text::createFromString()`` handles
transitions between 7-bit and 8-bit words: If an 8-bit word follows a
7-bit word, a space is appended to the previous word. The opposite
case of a 7-bit word following an 8-bit word *misses* this behaviour.

When one fixes this problem, a follow-up issue appears:

``text::createFromString("a b\xFFc d")`` tokenizes the input into
``m_words={word("a "), word("b\xFFc ", utf8), word("d")}``. This
"right-side alignment" nature of the whitespace is a problem for
word::generate():

As per RFC 2047, spaces between adjacent encoded words are just
separators but not meant to be displayed. A space between an encoded
word and a regular ASCII text is not just a separator but also meant
to be displayed.

When word::generate() outputs the b-word, it would have to strip one
space, but only when there is a transition from encoded-word to
unencoded word. word::generate() does not know whether d will be
encoded or unencoded.

The idea now is that we could change the tokenization of
``text::createFromString`` such that whitespace is at the *start* of
words rather than at the end. With that, word::generate() need not
know anything about the next word, but rather only the *previous*
one.

Thus, in this patch,

1. The tokenization of ``text::createFromString`` is changed to
   left-align spaces and the function is fixed to account for
   the missing space on transition.
2. ``word::generate`` learns how to steal a space character.
3. Testcases are adjusted to account for the shifted
   position of the space.

Fixes: #283, #284

Co-authored-by: Vincent Richard <vincent@vincent-richard.net>
2024-05-21 15:55:06 +02:00
Jan Engelhardt
c105165c6e
tests: switch a byte sequence in textTest (#305)
Switch out the byte sequence by one that is simiarly random, but one
which happens to decode as valid UTF-8, such that the expected and
actual strings are shown with reasonable characters on a terminal.
2024-05-21 15:48:26 +02:00
Jan Engelhardt
b447adbe37
Fixes/comments for guessBestEncoding (#304)
* tests: add case for getRecommendedEncoding

* vmime: avoid integer multiply wraparound in wordEncoder::guessBestEncoding

If the input string is 42949673 characters long or larger, there will
be integer overflow on 32-bit platforms when multiplying by 100.
Switch that one computation to floating point.

* vmime: update comment in wordEncoder::guessBestEncoding
2024-05-21 15:47:05 +02:00
Jan Engelhardt
97d15b8cd7
vmime: avoid changing SEVEN_BIT when encoding::decideImpl sees U+007F (#303)
* vmime: avoid changing SEVEN_BIT when encoding::decideImpl sees U+007F

Do not switch to QP/B64 when encountering U+007F.
U+007F is part of ASCII just as much as U+0001 is.

---------

Co-authored-by: Vincent Richard <vincent@vincent-richard.net>
2024-05-21 15:45:29 +02:00
vincent-richard
561746081f Fixed possible recursion crash when parsing mailbox groups. 2022-01-25 10:28:20 +01:00
ibanic
5d78d879bb Prevent accessing empty buffer 2021-05-15 22:32:24 +02:00
Jan Engelhardt
f4c611b736 Avoid force-encoding display names that fit within qcontent
When the display name contains an At sign, or anything of the sort,
libvmime would forcibly encode this to =?...?=, even if the line
is fine ASCII which only needs quoting.

rspamd takes excessive quoting as a sign of spam and penalizes
such mails by raising the score (rule/match: TO_EXCESS_QP et al.)
2020-12-11 23:10:39 +01:00
vincent-richard
5c00f7867a #238 Fixed whitespace between encoded words 2020-06-16 19:47:33 +02:00
vincent-richard
9a10a839ec Added test. 2020-06-02 18:13:34 +02:00
Jan Engelhardt
b06e9e6f86 Skip delimiter lines that are not exactly equal to the boundary
There is crap software out there that generates mails violating the
prefix ban clause from RFC 2046 §5.1 ¶2.

Switch vmime from a prefix match to an equality match, similar to
what Alpine and Thunderbird do too.
2019-10-05 11:37:09 +02:00
Jan Engelhardt
df32418df5 Disregard whitespace between leading boundary hyphens and marker
The way I read the RFC is that whitespace is not allowed before the
boundary marker, only afterwards, so the checks for leading WS are
removed, and the missing check for trailing WS is added.

See RFC 2046 §5.1.1: """The boundary delimiter line is then defined
as a line consisting entirely of two hyphen characters ("-", decimal
value 45) followed by the boundary parameter value from the
Content-Type header field, optional linear whitespace, and a
terminating CRLF."""
2019-10-05 11:31:51 +02:00
Jan Engelhardt
d1190b496f Improve address parser for malformed mailbox specifications
Spammers use "Name <addr> <addr>" to trick some parsers.
My expectations as to what the outcome should be is presented
in the updated mailboxTest.cpp.

The DFA in mailbox::parseImpl is hereby redone so as to pick the
rightmost address-looking portion as the address, rather than
something in between. While doing so, it will also no longer mangle
the name part anymore (it does this by keeping a "as_if_name"
variable around until the end).
2019-01-25 08:11:07 +01:00
Jan Engelhardt
cc18aa39c1 tests: add more malformation tests to mailboxTest 2019-01-24 13:17:52 +01:00
Vincent Richard
df135b5a8b Removed 'stringProxy' since COW std::string is no longer valid in C++11. 2018-09-15 07:41:26 +02:00
Vincent Richard
b55bdc9c0b Code style and clarity. 2018-09-05 23:54:48 +02:00
Vincent Richard
f173b0a535 Avoid copy by passing shared_ptr<> with const reference. 2018-08-18 16:08:25 +02:00
Vincent Richard
abba40e97d Added unit test related to PR #192. 2018-03-12 20:33:27 +01:00
Vincent Richard
c53e914ea5 Always ignore newlines between words. 2017-01-02 21:40:38 +01:00
Vincent Richard
5424aa2381 Fixed #149: don't loose charset when fixing invalid broken words. 2016-11-05 13:31:54 +01:00
Vincent Richard
4fd8976515 Issue #126: more warnings fixed. 2016-03-13 20:15:22 +01:00
Vincent Richard
c446afddd4 Estimate generated size of parameterized field. 2015-06-07 21:32:44 +02:00
Vincent Richard
e88b8eeac2 Fixed parsing of UTF8 email addresses (RFC-2047 local part + IDNA domain name). 2015-05-03 19:17:00 +02:00
Vincent Richard
19321f9026 Fixed unit test so that is does not depend on the current locale charset. 2015-02-19 21:24:41 +01:00
Vincent Richard
c5c66f9fdc Issue #103: fix badly encoded words. 2015-02-16 18:43:03 +01:00
Vincent Richard
e7739c0efe Fixed issue #98: support for wrongly padded B64 words. 2015-01-14 19:35:28 +01:00
Vincent Richard
03a0e36e91 Added support for language specification in RFC-2047 encoded words and RFC-2231 parameter values. 2014-06-30 22:48:42 +02:00
Vincent Richard
0863f50c26 Allow choosing between encoding modes for parameter values. 2014-06-17 21:08:22 +02:00
Vincent Richard
4aefcca374 Removed useless 'virtual' inheritance (fixed issue #84). 2014-06-06 19:26:01 +02:00
Vincent Richard
30ea54f269 Fixed parsing of empty lines in header field value. 2014-06-01 20:46:17 +02:00
Vincent Richard
ef892af655 Do not make calls to setlocale() in a library. Use default user locale in tests and examples. 2014-01-16 00:27:51 +01:00
Vincent Richard
7e265b05f4 Simplified types for better readability. Use appropriate types (size_t, byte_t...). Minor warning fixes. 2013-12-10 08:52:51 +01:00
Vincent Richard
f9913fa28a Boost/C++11 shared pointers. 2013-11-21 22:16:57 +01:00
Vincent Richard
29954e5e50 Fixed group parsing in mailboxList. 2013-10-16 19:47:24 +02:00
Vincent Richard
b886cd4864 Refactored the way embedded objects are referenced in MHTML. 2013-07-11 18:06:26 +02:00
Vincent Richard
86f0a63802 Do not QP-encode CRLFs when content type is text. 2013-06-27 13:56:55 +02:00
Vincent Richard
de659db112 Removed debug printf. 2013-06-27 07:54:33 +02:00
Vincent Richard
1a30cfe41b Unit tests for content handlers. 2013-06-26 21:41:42 +02:00
Vincent Richard
895b07cae9 Added support for SIZE SMTP extension (RFC-1870). 2013-06-24 15:32:40 +02:00
Vincent Richard
2e5574b146 Added support for transport padding in boundary (issue #38). 2013-06-13 12:00:42 +02:00
Vincent Richard
02e1cf65ab Fixed comment. 2013-06-09 10:24:56 +02:00
Vincent Richard
9d2703c376 Added support for charset conversion with ICU (thanks to Mehmet Bozkurt). 2013-03-25 12:32:48 +01:00
Vincent Richard
32eb1ebe34 Strip spaces at end of header lines (Zarafa). 2013-03-24 15:50:16 +01:00
Vincent Richard
21945be4c4 Fixed warnings and 64-bit issues. 2013-03-24 12:30:26 +01:00
Vincent Richard
495526a5e6 Let whitespace break the value of a parameterized header field, not just a ';' (thanks to Zarafa). 2013-03-24 11:35:08 +01:00
Vincent Richard
84415da8e1 Fixed parsing header field value on next line. 2013-03-24 10:02:23 +01:00
Vincent Richard
da2797702f Updated tests for charset conversion.
Added test for UTF-7 encoding availability. Added test for input buffer
underflow in charsetFilteredOutputStream. Refactored charset conversion
tests and removed useless tests.
2013-03-18 09:35:04 +01:00
Vincent Richard
32a80f6c1e Fixed mailbox and mailbox group parsing. Added unit tests. 2013-03-11 10:05:09 +01:00
Vincent Richard
1df8c6cd0e Refactored unit tests. 2013-03-08 08:19:55 +01:00
Vincent Richard
8378b350df Throw exception when an invalid value type is set in a header field. 2013-02-27 14:59:37 +01:00