Flexible Array Members

I’m looking for thoughts on the use on the typical pattern from C in C++. I’m pretty sure it’s officially and currently UB although there is a proposal:

https://thephd.github.io/vendor/future_cxx/papers/d1039.html

For example, it is used widely for serialisation/deserialisation in the audio codecs in JUCE

struct BWAVChunk
{
    char description[256];
    char originator[32];
    char originatorRef[32];
    char originationDate[10];
    char originationTime[8];
    uint32 timeRefLow;
    uint32 timeRefHigh;
    uint16 version;
    uint8 umid[64];
    uint8 reserved[190];
    char codingHistory[1];

    void copyTo (StringPairArray& values, const int totalSize) const
    {
        values.set (WavAudioFormat::bwavDescription,     String::fromUTF8 (description,     sizeof (description)));
        values.set (WavAudioFormat::bwavOriginator,      String::fromUTF8 (originator,      sizeof (originator)));
        values.set (WavAudioFormat::bwavOriginatorRef,   String::fromUTF8 (originatorRef,   sizeof (originatorRef)));
        values.set (WavAudioFormat::bwavOriginationDate, String::fromUTF8 (originationDate, sizeof (originationDate)));
        values.set (WavAudioFormat::bwavOriginationTime, String::fromUTF8 (originationTime, sizeof (originationTime)));

        auto timeLow  = ByteOrder::swapIfBigEndian (timeRefLow);
        auto timeHigh = ByteOrder::swapIfBigEndian (timeRefHigh);
        auto time = (((int64) timeHigh) << 32) + timeLow;

        values.set (WavAudioFormat::bwavTimeReference, String (time));
        values.set (WavAudioFormat::bwavCodingHistory,
                    String::fromUTF8 (codingHistory, totalSize - (int) offsetof (BWAVChunk, codingHistory)));
    }

    static MemoryBlock createFrom (const StringPairArray& values)
    {
        MemoryBlock data (roundUpSize (sizeof (BWAVChunk) + values[WavAudioFormat::bwavCodingHistory].getNumBytesAsUTF8()));
        data.fillWith (0);

        auto* b = (BWAVChunk*) data.getData();

        // Allow these calls to overwrite an extra byte at the end, which is fine as long
        // as they get called in the right order..
        values[WavAudioFormat::bwavDescription]    .copyToUTF8 (b->description, 257);
        values[WavAudioFormat::bwavOriginator]     .copyToUTF8 (b->originator, 33);
        values[WavAudioFormat::bwavOriginatorRef]  .copyToUTF8 (b->originatorRef, 33);
        values[WavAudioFormat::bwavOriginationDate].copyToUTF8 (b->originationDate, 11);
        values[WavAudioFormat::bwavOriginationTime].copyToUTF8 (b->originationTime, 9);

        auto time = values[WavAudioFormat::bwavTimeReference].getLargeIntValue();
        b->timeRefLow = ByteOrder::swapIfBigEndian ((uint32) (time & 0xffffffff));
        b->timeRefHigh = ByteOrder::swapIfBigEndian ((uint32) (time >> 32));

    values[WavAudioFormat::bwavCodingHistory].copyToUTF8 (b->codingHistory, 0x7fffffff);...<etc>

Does anyone know if there any official “it works in compilers X and Y” or only using certain kinds of usage? Or are we literally risking “killing the cat” in all cases. In the JUCE classes it is typically used via allocating memory with a MemoryBlock then casting to a C++ type. Another example is:

static MemoryBlock createFrom (const StringPairArray& values)
{
    MemoryBlock data;
    auto numLoops = jmin (64, values.getValue ("NumSampleLoops", "0").getIntValue());

    data.setSize (roundUpSize (sizeof (SMPLChunk) + (size_t) (jmax (0, numLoops - 1)) * sizeof (SampleLoop)), true);

    auto s = static_cast<SMPLChunk*> (data.getData());

    s->manufacturer      = getValue (values, "Manufacturer", "0");
    s->product           = getValue (values, "Product", "0");
    s->samplePeriod      = getValue (values, "SamplePeriod", "0");
    s->midiUnityNote     = getValue (values, "MidiUnityNote", "60");
    s->midiPitchFraction = getValue (values, "MidiPitchFraction", "0");
    s->smpteFormat       = getValue (values, "SmpteFormat", "0");
    s->smpteOffset       = getValue (values, "SmpteOffset", "0");
    s->numSampleLoops    = ByteOrder::swapIfBigEndian ((uint32) numLoops);
    s->samplerData       = getValue (values, "SamplerData", "0");

    for (int i = 0; i < numLoops; ++i)
    {
        auto& loop = s->loops[i];

Where SampleLoop loops[1]; referred to there.

Of course there is the other option to allocate enough (and correctly aligned) memory then use placement new and potentially have one of these single element arrays as the last member. I’ve not currently encountered any problems on current Clang or MSVC with either the JUCE classes nor other implementations (which use the placement new technique).

I think it’s practically impossible to do this in a standards-conformant way because placement new for arrays has a surprising characteristic:

Array allocation may supply unspecified overhead, which may vary from one call to new to the next. The pointer returned by the new-expression will be offset by that value from the pointer returned by the allocation function. cppreference

This means that, even if you carefully create a pointer with the correct alignment for the type you want to store, and use placement new to create an array ‘starting from’ that pointer, the compiler might add an unspecified amount of padding and muck up the alignment. There was a talk that includes a discussion of this topic at cppcon this year, found here.

If you want to write portable code that doesn’t trigger UB, I think your options are:

  • Use clang/gcc with the VLA extension enabled
  • Avoid VLAs completely

Great thanks! Although that quote appears to relate specifically to array allocation via new[] variants rather than a structure containing an array of literals (or POD types). Although I’m not arguing this is not fundamentally UB! :slight_smile:

I think it applies to all array forms of new, whether they are doing placement new or allocating fresh memory, although I’m not sure about that.