Skip to content

Commit c15cc63

Browse files
committed
Add double-transcode optimization section to the paper
1 parent 72861ec commit c15cc63

File tree

1 file changed

+175
-0
lines changed

1 file changed

+175
-0
lines changed

paper/P2728.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1378,6 +1378,181 @@ std::basic_string<ToCharT> transcode_to(std::basic_string<FromCharT> const& inpu
13781378
}
13791379
```
13801380

1381+
## Optimizing for Double-Transcoding
1382+
1383+
In generic code, it's possible to introduce transcoding views that wrap other transcoding
1384+
views:
1385+
1386+
```c++
1387+
void foo(std::ranges::view auto v) {
1388+
#ifdef _MSC_VER
1389+
windows_function(v | std::views::to_utf16);
1390+
#endif
1391+
// ...
1392+
}
1393+
1394+
int main(int, char const* argv[]) {
1395+
foo(std::null_term(argv[1]) | std::views::as_char8_t | std::views::to_utf32);
1396+
}
1397+
```
1398+
1399+
In the above example, if the user is building on Windows, `foo` will create a
1400+
`to_utf16_view` wrapping a `to_utf32_view`.
1401+
1402+
### Eliding the Inner Transcoding View in the CPO
1403+
1404+
You might want to add logic in the CPO such that it notices that `foo` is creating a
1405+
`to_utf16_view` wrapping a `to_utf32_view`, elides the `to_utf16_view`, and creates the
1406+
`to_utf32_view` directly wrapping the view produced by `as_char8_t`.
1407+
1408+
However, this runs into issues where the result of `base()` isn't what the user
1409+
expects. Consider this transcode function that works similarly to `std::ranges::copy`, in
1410+
that it returns both the output iterator and the final position of the input iterator:
1411+
1412+
```c++
1413+
template <typename I, typename O>
1414+
using transcode_result = std::ranges::in_out_result<I, O>;
1415+
1416+
template <std::input_iterator I, std::sentinel_for<I> S, std::output_iterator<char8_t> O>
1417+
transcode_result<I, O> transcode_to_utf32(I first, S last, O out) {
1418+
auto r = std::ranges::subrange(first, last) | to_utf32;
1419+
1420+
auto copy_result = std::ranges::copy(r, out);
1421+
1422+
return transcode_result<I, O>{copy_result.in.base(), copy_result.out};
1423+
}
1424+
```
1425+
1426+
if `copy_result.in.base()` is a different type than `first`, this will break.
1427+
1428+
### "Innermost" Optimization
1429+
1430+
Instead, the iterator of the transcoding view can "look through" the iterator of the inner
1431+
transcoding view that it's wrapping. Since the iterator is just a backpointer to the
1432+
parent and an iterator to the current position, optimizing like this instead points the
1433+
backpointer to its parent's parent, and uses the inner iterator of the iterator it's
1434+
wrapping for the current position. We use exposition-only concepts named
1435+
`@*innermost-parent*@` and `@*innermost-base*@` to explicate how this works in the
1436+
wording.
1437+
1438+
The wording change that would enable this optimization is as follows:
1439+
1440+
#### Additional helper templates
1441+
1442+
```diff
1443+
+ template<class T>
1444+
+ concept @*to-utf-view-iterator-optimizable*@ = @*unspecified*@ // @*exposition only*@
1445+
+ template<class T>
1446+
+ concept @*to-utf-view-sentinel-optimizable*@ = @*unspecified*@ // @*exposition only*@
1447+
```
1448+
1449+
These concepts are true when the type in question is the iterator/sentinel of a transcoding view.
1450+
1451+
#### Exposition-only class template `@*to-utf-view-impl*@`
1452+
1453+
```diff
1454+
- using @*Parent*@ = @*maybe-const*@<Const, @*to-utf-view-impl*@>; // @*exposition only*@
1455+
- using @*Base*@ = @*maybe-const*@<Const, V>; // @*exposition only*@
1456+
+ using @*innermost-parent*@ = @*unspecified*@ // @*exposition only*@
1457+
+ using @*innermost-base*@ = @*unspecified*@ // @*exposition only*@
1458+
+ static constexpr bool @*optimizing*@{@*to-utf-view-iterator-optimizable*@<iterator_t<@*Base*@>>
1459+
```
1460+
1461+
#### Class `@*to-utf-view-impl*@::@*iterator*@`
1462+
1463+
```diff
1464+
- iterator_t<@*Base*@> current_ = iterator_t<@*Base*@>(); // @*exposition only*@
1465+
- @*Parent*@* parent_ = nullptr; // @*exposition only*@
1466+
+
1467+
+ iterator_t<@*innermost-base*@> current_ = iterator_t<@*innermost-base*@>(); // @*exposition only*@
1468+
+ @*innermost-parent*@* parent_ = nullptr; // @*exposition only*@
1469+
```
1470+
1471+
```diff
1472+
- constexpr @*iterator*@(@*Parent*@& parent, iterator_t<@*Base*@> begin)
1473+
- : current_(std::move(begin)),
1474+
- parent_(addressof(parent)) {
1475+
- if (base() != @*end*@())
1476+
- @*read*@();
1477+
- else if constexpr (!forward_range<@*Base*@>) {
1478+
- buf_index_ = -1;
1479+
- }
1480+
- }
1481+
+
1482+
+ constexpr @*iterator*@(@*innermost-parent*@& parent, iterator_t<@*innermost-base*@> begin)
1483+
+ : current_(std::move(begin)),
1484+
+ parent_(std::addressof(parent))
1485+
+ {
1486+
+ if (current_ != @*end*@())
1487+
+ @*read*@();
1488+
+ else if constexpr (!forward_range<@*Base*@>) {
1489+
+ buf_index_ = -1;
1490+
+ }
1491+
+ }
1492+
+
1493+
+ constexpr @*iterator*@(@*Parent*@& parent, iterator_t<@*Base*@> begin) requires @*optimizing*@
1494+
+ : current_(std::move(begin.current_)), parent_(begin.parent_) {
1495+
+ if (current_ != @*end*@())
1496+
+ @*read*@();
1497+
+ else if constexpr (!forward_range<@*Base*@>) {
1498+
+ buf_index_ = -1;
1499+
+ }
1500+
+ }
1501+
```
1502+
1503+
```diff
1504+
- constexpr const iterator_t<@*Base*@>& base() const& noexcept { return current_; }
1505+
+ constexpr iterator_t<@*Base*@> base() const& noexcept requires forward_range<@*Base*@>
1506+
+ {
1507+
+ if constexpr (@*optimizing*@) {
1508+
+ return iterator_t<@*Base*@>{*parent_, current_};
1509+
+ } else {
1510+
+ return current_;
1511+
+ }
1512+
+ }
1513+
1514+
- constexpr iterator_t<@*Base*@> base() && { return std::move(current_); }
1515+
+ constexpr iterator_t<@*Base*@> base() && {
1516+
+ if constexpr (@*optimizing*@) {
1517+
+ return iterator_t<@*Base*@>{*parent_, std::move(current_)};
1518+
+ } else {
1519+
+ return std::move(current_);
1520+
+ }
1521+
+ }
1522+
```
1523+
1524+
```diff
1525+
- constexpr sentinel_t<@*Base*@> @*end*@() const { // @*exposition only*/
1526+
+ constexpr sentinel_t<@*innermost-base*@> @*end*@() const { // @*exposition only*/
1527+
return end(parent_->base_);
1528+
}
1529+
```
1530+
1531+
#### Class `@*to-utf-view-impl*@::@*sentinel*@`
1532+
1533+
```diff
1534+
- using @*Parent*@ = @*maybe-const*@<Const, @*to-utf-view-impl*@>; // @*exposition only*@
1535+
- using @*Base*@ = @*maybe-const*@<Const, V>; // @*exposition only*@
1536+
- sentinel_t<@*Base*@> end_ = sentinel_t<@*Base*@>();
1537+
+
1538+
+ using @*innermost-parent*@ = @*unspecified*@ // @*exposition only*@
1539+
+ using @*innermost-base*@ = @*unspecified*@ // @*exposition only*@
1540+
+ sentinel_t<@*innermost-base*@> end_ = sentinel_t<@*innermost-base*@>();
1541+
+ static constexpr bool @*optimizing*@{@*to-utf-view-sentinel-optimizable*@<sentinel_t<@*Base*@>>};
1542+
```
1543+
1544+
```diff
1545+
+ constexpr explicit @*sentinel*@(sentinel_t<@*Base*@> end) requires @*optimizing*@
1546+
+ : end_{end.end_} {}
1547+
```
1548+
1549+
```diff
1550+
+ constexpr sentinel_t<@*Base*@> base() const requires @*optimizing*@
1551+
+ {
1552+
+ return sentinel_t<@*Base*@>{end_};
1553+
+ }
1554+
```
1555+
13811556
# Changelog
13821557

13831558
## Changes since R8

0 commit comments

Comments
 (0)