Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the lifetime of the variant members of union class objects operated by memcpy #5213

Open
xmh0511 opened this issue Jan 18, 2022 · 18 comments
Open

Comments

@xmh0511
Copy link
Contributor

xmh0511 commented Jan 18, 2022

Consider this example

union U{
   int i;
   char c;
};
U u1 = {.i = 1};  // #1
U u2 = {.c = 0};  // #2
std::memcpy(&u2, &u1, sizeof(U));  // #3

At #1, the variant member U::i of the object u1 is active, while at #2, the variant member U::c of the object u2 is active. Since [cstring.syn] p3 says

The functions memcpy and memmove are signal-safe. Both functions implicitly create objects ([intro.object]) in the destination region of storage immediately prior to copying the sequence of characters to the destination.

Thus, before the copying starts, what can be sure is that an object of type U is implicitly created in the specified storage. In addition, the lifetime of either U::i or U::c can start due to the "implicitly create objects" but not both. Which is regulated by [class.union.general] p2

At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

Since, immediately prior to copying, we cannot determine which variant member has begun its lifetime in the newly created union object that occupies the storage of u2. Also, after the copying, we also cannot determine which one is active. Is it U::i because the corresponding variant is active in the source? Or, Is it U::c since it's the one selected by the "implicitly creating objects" operation.

@xmh0511 xmh0511 changed the title the lifetime of the variant member of union class objects operated by memcpy the lifetime of the variant members of union class objects operated by memcpy Jan 18, 2022
@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 18, 2022

may have some connections with issue #5193

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jan 19, 2022

Thus, before the copying starts, what can be sure is that an object of type U is implicitly created in the specified storage.

It cannot be sure according to [intro.object] p10 (... that operation implicitly creates and starts the lifetime of zero or more objects of implicit-lifetime types ...), because the invocation may even create no object (in this case, u2.c is still active). The non-determinism has been mentioned here.

I'm not sure whether implicit object creation can end the lifetime of a living object. If so, the effect of the memcpy call is non-deterministic: it may create no object, or a new U object whose U::i or U::c is active, or create a new object of a different type in the target storage. But if implicit object creation can't end the lifetime of a living object, we can be sure that the active member of u2 is unchanged.

Consider this example which may be unrelated to the existing active member:

union U { int i; char c; };

U u1{.i = 1};
alignas(U) unsigned char buf[sizeof(U)]; // #1
std::memcpy(buf, &u1, sizeof(U)); // #2

auto p = std::launder(reinterpret_cast<U*>(buf)); // #3
auto consume = [](auto){};
if (std::rand() < RAND_MAX / 2)
    consume(p->i); // #4
else 
    consume(p->c); // #5

Here #1 and #2 implicitly create objects in buf, but it's non-deterministic which objects are created. Then #3 requires that a U object is created, but it's active member is non-deterministic. Finally #4 and #5 determine which member of *p is created.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 19, 2022

I'm not sure whether implicit object creation can end the lifetime of a living object.

IMO, if the operation of the implicitly creating object creates at least one object, then that living object will be ended its lifetime since the newly created object reuses the storage occupied by that living object.

Finally #4 and #5 determine which member of *p is created.

I'm not 100% sure that I correctly understand [intro.object] p10.

If no such set of objects would give the program defined behavior, the behavior of the program is undefined.

So, in order to make the program be well-defined, the lifetime of U::i and U::c should both be alive since either of them is possible to be evaluated. According to [class.union], at any time only at most one member should be active. Hence, there is no such set of objects that would give the program defined behavior(i.e., the operation of the implicitly creating object is impossible to make both variants be alive to guarantee the program to be well-defined), thus the behavior of the program is undefined.


Incidentally, it is reasonable that the behavior of that memcpy from a union class object to another union class object should be consistent with that of an implicitly-defined copy assignment operator for a union class. They have the common point: copy the object representation

@frederick-vs-ja
Copy link
Contributor

So, in order to make the program be well-defined, the lifetime of U::i and U::c should both be alive since either of them is possible to be evaluated.

I don't think so. IIUC only one member is needed to be alive as only one of #4 and #5 is evaluated, and [intro.object] p10 permits such understanding:

  • If #4 is evaluated, then the set of created objects includes *p and p->i, otherwise
  • #5 is evaluated, and the set of created objects includes *p and p->c.

In every execution of the program, only one of these two mutually exclusive cases is encounted, so the program has defined behavior.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 19, 2022

The implicitly creating objects operation is at #2, at this point, how do you know which substatement of the below if statement will be executed? Informally say, it is known until the runtime. In addition, [basic.life] p1 does not admit the operation of implicitly creating objects can start the lifetime of a union member

except that if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union ([dcl.init.aggr], [class.base.init]), or as described in [class.union] and [class.copy.ctor], and except as described in [allocator.members].

As I said, memcpy operation for a union class should be regulated with a special rule to clear the hazy here.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jan 19, 2022

The implicitly creating objects operation is at #2, at this point, how do you know which substatement of the below if statement will be executed?

Yes, we can't know which substatement will be executed, and can't determine the set of created objects at #2. But I don't think the set is needed to be determined at this point.

In addition, [basic.life] p1 does not admit the operation of implicitly creating objects can start the lifetime of a union member

I don't think [basic.life] p1 convers the cases of implicit object creation, although the current wording is confusing for me.
If we consider that [basic.life] p1 does not admit that the operation of implicitly creating objects can starts the lifetime of a union member, then we should also consider the previous two bullets restrict the operations that start the lifetime of objects:

(1.1) - storage with the proper alignment and size for type T is obtained, and
(1.2) - its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),

It's arguble whether the wording permits implicit object creation - because the latter never performs initialization.

As I said, memcpy operation for a union class should be regulated with a special rule to clear the hazy here.

Here might be hazy, but I don't think that unions are special. IMO the wording of [basic.life] should be harmonized with [intro.object].

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 19, 2022

But I don't think the set is needed to be determined at this point.

In my opinion, such an operation should scan the subsequent codes to one-off determine what the set is. It seems that you consider such a process is dynamic, looks like the determination of the set will be deferred and such set will be dynamically changed according to what the concrete subsequent execution needs(the set includes *p and p->i when executing #4, conversely, the set includes *p and p->c when executing #5).

but I don't think that unions are special

unions are special. we don't need to be concerned about which member subobjects can be created and be begun its lifetime, as long as they are of implicitly-lifetime types and are needed by the program. However, as nominated in above, we do concern about such issues when they are variant members.


If your interpretation of "set of implicitly creating objects" is correct, I think consume(p->c); at #5 is always UB since p->c always has an indeterminate value according to [basic.indet], even if the object is implicitly created and has its lifetime begun. Rather, consume(p->i); may be ok if p->i is alive, and according to [basic.types.general] p3, p->i holds the same value as u1.i.

@frederick-vs-ja
Copy link
Contributor

frederick-vs-ja commented Jan 20, 2022

Oh... I should use unsigned char or std::byte instead. It seems that the some rules for indeterminate values are weird (CWG1997) but still well-defined.

It seems that you consider such a process is dynamic, looks like the determination of the set will be deferred and such set will be dynamically changed according to what the concrete subsequent execution needs(the set includes *p and p->i when executing #4, conversely, the set includes *p and p->c when executing #5).

I guess the such set doesn't need to be dynamically changed although it can't be statically determined. The set of "possible sets of implicitly created object within a given storage by a given explicit operation" may be considered dynamically shrinking. When all operations on the storage has been done, every element in "that set of sets" can be considered as "the set of implicitly creating objects" (which is unchanged during these operations), and no UB happens if the "that set of sets" hasn't become empty.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 20, 2022

There is an example that can prove

may be considered dynamically shrinking

is not true

if (std::rand() < RAND_MAX / 2){
       consume(p->i); // #4
      goto label;
}else{
    label:
     consume(p->c); // #5
} 

In this example, once the condition is true, the set shall include both p->i and p->c. Conversely, if the condition is false, the set shall include p->c. Maybe, you could arguably say: if the first substatement was hit, the program is UB; otherwise, the program is well-defined.

@frederick-vs-ja
Copy link
Contributor

No. The dynamically shrinking set I meant is the set of possible sets of implicitly created objects, whose elements are sets of objects.

Given the example you showed

if (std::rand() < RAND_MAX / 2){
       consume(p->i); // #4
      goto label;
}else{
    label:
     consume(p->c); // #5
} 

IMO before the execution of the statement (the set at that point is called I for exposition),

  • some elements (forming subset X for exposition) of that set include p->i, and
  • some elements (forming subset Y for exposition) of that set include p->c, and
  • other elements (forming subset Z for exposition) include neither p->i and p->c.

Then X, Y, and Z are pairwise disjoint, and I is the union set of X, Y, and Z. None of element of I includes both p->i and p->c.

If the the condition is true, then the set of possible sets of implicitly created objects shrinks from I to X in #4, and then shrinks to empty in #5, which causes UB. Conversely, the set of possible sets of implicitly created objects shrinks from I to Y in #5 and the program may have well-defined behavior.

@xmh0511
Copy link
Contributor Author

xmh0511 commented Jan 20, 2022

Ah, after reading [intro.object] p10 more carefully. You're correct. I would prefer to consider / contains no Z rather than that "shrinks to empty in #5", the latter is a bit obscure.

The operation can implicitly create and start the lifetime for either p->i or p->c but not both, thus the operation can construct the sets:

  • X: { *p, p->i }
  • Y: { *p, p->c }

since the lifetimes of p->i and p->c are mutually exclusive, thus the operation cannot construct the set Z:{ *p, p->i, p->c} in any way. Hence, / eventually is {X, Y}(namely, {{*p, p->i}, {*p, p->c} }). If the condition were true, it would require a set of objects that is {*p, p->i, p->c}, since we cannot find the set in /, hence that execution would be UB, as per:

If no such set of objects would give the program defined behavior, the behavior of the program is undefined.

@languagelawyer
Copy link
Contributor

It seems that the some rules for indeterminate values are weird (CWG1997)

Every time I see this issue, I facepalm from its inconsistent/half-baked logic.
If «no new storage is being obtained for the int object created by the new-expression», then why care which value would be read through *ip, when this would be an access to an out-of-lifetime object? [basic.life]/1:

The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained

No (new) storage is obtained == no lifetime is started.

I'd say this issue should be closed as NAD with a comment that «obtained ≠ allocated» (see https://stackoverflow.com/a/58356588).

@frederick-vs-ja
Copy link
Contributor

The lifetime of an object of type T begins when:
— storage with the proper alignment and size for type T is obtained

Such wording is unchanged since N3337. Is this really consistent with the rule for storage providing (P0137R1) or implicit object creation (P0593R6)?

@languagelawyer
Copy link
Contributor

Whats wrong with this rule from implicit-object creation POV?

@frederick-vs-ja
Copy link
Contributor

I'd say this issue should be closed as NAD with a comment that «obtained ≠ allocated» (see https://stackoverflow.com/a/58356588).

Let's continue the discussion of obtaining storage here (instead of #4553) ... It'll be confusing to me to just say the same region of storage can be obtained more than once. Perhaps it makes more sense to clarify that storage obtaining and the start of lifetime are not one-to-one corresponding.

Whats wrong with this rule from implicit-object creation POV?

Maybe I referred the next bullet ([basic.life]/1.2) at that time. [basic.life]/1.1 probably has no extraneous problem on implict object creation.

If I understand correctly, [basic.life]/1.2 is now problematic, because the lifetime of an implicitly created object starts without any kind of initialization (including vacuous initialization).

@languagelawyer
Copy link
Contributor

It'll be confusing to me to just say the same region of storage can be obtained more than once

It is «obtained [once] per object», not just abstractly «obtained»

If I understand correctly, [basic.life]/1.2 is now problematic, because the lifetime of an implicitly created object starts without any kind of initialization (including vacuous initialization).

The lifetime of implicitly created objects is covered by http://eel.is/c++draft/intro.object#10, not [basic.life]

@frederick-vs-ja
Copy link
Contributor

It'll be confusing to me to just say the same region of storage can be obtained more than once

It is «obtained [once] per object», not just abstractly «obtained»

This is still confusing. Would it be better to avoid saying "storage of ... is obtained"?

@frederick-vs-ja
Copy link
Contributor

CWG2675 seems to be related to starting the lifetime of a union member. Although only start_lifetime_as is mentioned in the current decription, memcpy should be similarly handled since it also implicitly create objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants