GCC 13 Supports New C2x Features, Including Nullptr, Enhanced Enumerations

cjensen · on May 15, 2023

nullptr has been such a Godsend for C++. Good to see it coming to C.

If you ever see the macro NULL in code, be afraid. There are two valid ways of defining the macro and that cause weird issues when porting code. For example, in the statement printf ("%p %s\n", NULL, "Hello world!"), one of the definitions leads to NULL being interpreted as the null pointer, and the other leads to NULL being interpreted as an integer. The latter may crash if integer and pointer are different sizes.

It also causes problems with C++ overloading if one overload takes a pointer and another takes an integer.

ori_b · on May 15, 2023

> If you ever see the macro NULL in code, be afraid. There are two valid ways of defining the macro

Not on a Posix system, where the only valid definition of it is `(void*)0`. C could have adopted this definition.

Nullptr is needed in C++ because `0` is the only definition of `NULL` that works with the type system, due to the lack of implicit `void*` conversions.

C doesn't have this problem.

Adopting the Posix definition of NULL in the standard would have been sufficient -- and unlike `nullptr`, would have solved bugs in existing programs.

haileys · on May 15, 2023

> Nullptr is needed in C++ because `0` is the only definition of `NULL` that works with the type system, due to the lack of implicit `void` conversions.*

> C doesn't have this problem.

Except for conversions between data pointers and function pointers ;)

Initialization seems to be a special case, but with `-pedantic` the following code will show a warning on the initialization of `fp2`:

    void* return_null() {
        return 0;
    }
    
    int main() {
        void(*fp1)() = 0;
        void(*fp2)() = return_null();
    
        return 0;
    }

saagarjha · on May 15, 2023

POSIX requires this to be supported as well.

_yvc3 · on May 15, 2023

>C doesn't have this problem

I don't think a strong type system is a `problem`, implicit conversions can lead to so many annoying and hard to find bugs.

ori_b · on May 15, 2023

That's a fine general sentiment. However, in this context it's a problem if you want to assign NULL to a pointer without a cast, which is why C++ added the magically convertible nullptr in addition to the magically convertible `0` constant.

     char *x = 0; // ok in C and C++
     char *y = (void*)0; // ok in C, error in C++
     char *z = nullptr; // ok in C++

therefore:

     #define NULL ((void*)0) // Required by Posix C, invalid C++
     #define NULL 0 // Pre-nullptr, the only valid C++ definition

C++ can't define NULL the safe way that Posix C does.

I don't understand why it's more acceptable to allow magic `0` conversions than magic `(void*)0` conversions, given that the latter is far less likely to happen by accident -- but here we are.

rightbyte · on May 15, 2023

> I don't understand why it's more acceptable to allow magic `0` conversions than magic `(void*)0` conversions

In the end you don't have to chose between '0' and 'nullptr' anyways.

    char *x = (decltype (nullptr)) 0;

Dylan16807 · on May 15, 2023

They didn't say the type system is a problem, they said it caused a problem.

kzrdude · on May 15, 2023

Case in point, integer arithmetic in C. Reasoning about types there is just tiring.

whimsicalism · on May 16, 2023

> due to the lack of implicit `void` conversions. C doesn't have this problem.

implicit void conversions are the problem

ori_b · on May 16, 2023

Give me a legitimate use for void* that isn't a conversion.

kazinator · on May 15, 2023

The problems are that:

- NULL is idiomatic: using NULL is entrenched in C programming and it is not going away.

- In spite of nullptr existing now, NULL is still (quite stupidly) not required to just expand to nullptr, but to an implementation-defined null pointer constant, rather than #define NULL nullptr. (According to the N2596 draft).

- They had over 30 years to tighten the requirements on how NULL can be defined; what's the matter? C99 could already have required NULL to be ((void *) X) where X is an integer-typed constant expression evaluating to zero.

I'm not going to start using nullptr. It's not idiomatic C. I'm going to hold out hope that NULL will be fixed so that it expands to nullptr.

--

Also, it's possible for a compiler to diagnose when a constant, zero-valued expression is used as the argument of a variadic function. The diagnostic can be confined to cases when such a constant expression is the result of macro expansion:

  printf_like_function("fmt ...", ... 0, ...);  // OK

  #define FOO 0

  printf_like_function("fmt ...", ... FOO, ...);  // compiler diagnostic

  printf_like_function("fmt ...", ... (int) FOO, ...); // OK

cogman10 · on May 15, 2023

> In spite of nullptr existing now, NULL is still (quite stupidly) not required to just expand to nullptr, but to an implementation-defined null pointer constant, rather than #define NULL nullptr. (According to the N2596 draft).

This is so silly. I sort of get why not (can't break the dork that decided to do

    int i = NULL;
    i++;

)

But, at the same time... I almost feel like this is a "you are being a dork, go fix your code." moment. This isn't the sort of break where someone would see it and go "Oh yeah, assuming NULL is anything other than nullptr is dumb!"

kazinator · on May 15, 2023

> can't break the dork that decided to

Why not? We've broken the dork who used undeclared functions, void main, gets ...

(It's the same funking dork anyway. You know who you are, I'm looking at you!)

Note that

  int x = ((void *) 0);

will actually work in GCC and get you a zero into x, just with a conversion warning. The dork is unaffected; their code works and they don't read warnings.

GabrielTFS · on May 17, 2023

See https://gcc.gnu.org/pipermail/gcc/2023-May/241264.html for why this might not be the case anymore soon - although I suppose that adding in -fpermissive or -Wno-error=conversion or something like that isn't too much effort.

pjmlp · on May 15, 2023

Once upon a time K&R C function declarations were idiomatic, in C23 they are out.

bigbillheck · on May 15, 2023

> NULL is idiomatic

It is today, but idioms are a human concept and who knows how things will be in 2033?

kllrnohj · on May 15, 2023

Well hopefully at some point we'll stop writing C at all

bigbillheck · on May 15, 2023

Inshallah some day, but I don't see it happening in what's left of my career.

LordShredda · on May 15, 2023

But does C need a nullptr keyword? If you're programming in C, you usually define 0 as an invalid value, or a null value. C doesn't have the insane type system C++ has and doesn't have a very strong need to make a distinction between a pointer or an integer, since they're all in the end numbers.

The printf example you gave is an example of garbage in, garbage out. If NULL is a macro not defined as a pointer sized integer, then you're at fault here.

roqi · on May 15, 2023

> But does C need a nullptr keyword?

Yes, it does.

> If you're programming in C, you usually define 0 as an invalid value, or a null value.

That was also the usual pattern in C++ when there was no alternative. Once nullptr was introduced in C++, NULL or 0 quickly became a code smell.

> C doesn't have the insane type system C++ has and doesn't have a very strong need to make a distinction between a pointer or an integer, since they're all in the end numbers.

C++'s type system is far from insane. It's actually one of it's killer features.

You're both entirely oblivious to the need to not conflate pointers with integers and failing to present any case in favour of the legacy and broken use of NULL, and in the process failing to address all family of known error patterns involving it.

> The printf example you gave is an example of garbage in, garbage out. If NULL is a macro not defined as a pointer sized integer, then you're at fault here.

Again, you seem to be completely oblivious to the problem domain. NULL is not a macro as far as C or C++ compilers are concerned. NULL is a magic constant that's resolved at preprocessing time. Replacing NULL with nullptr means a magic constant is replaced by a concrete type, and thus whole family of errors can be avoided with compile time checks. Claiming that the developers who wrote in bugs are at fault for inadvertently adding bugs makes no sense at all because it does not solve any problem at all, and instead is just cynical finger pointing. I take compile-time checks over unhelpful finger pointing all day every day.

colonwqbang · on May 15, 2023

NULL is a macro.

The original mistake by the standards committee was allowing implicit conversions from integer to pointer. I.e. allowing NULL to be defined as simply 0.

If NULL had been defined always as ((void *) 0) then I don't see that we would have had a problem.

But that's all history now and in this situation I can see that adding nullptr becomes a reasonable way out.

It's ironic though that the fix for the different ways to write null is to add yet another way.

roqi · on May 15, 2023

> NULL is a macro.

You're missing the whole point.

As per the C standard, NULL is an implementation-defined null Pointer constant.

Macros are resolved in the preprocessing step. The compiler does not know what a macro is. What the compiler knows is whatever the preprocessor passes off in place of the macro. This means the compiler only sees a constant, and has no way to tell what that constant means.

If instead of passing random pointer constants you pass an actual type, now the compiler can tell more things.

> If NULL had been (...)

Irrelevant. The whole point is that it wasn't the committee looked at the problem, and it determined that using a dedicated type is safer, more powerful, and more elegant than passing magic numbers around.

ori_b · on May 16, 2023

I am on the committee. As you can probably tell by my posts, this was not unanimous.

I'd categorize nullptr as 'mostly harmless'

mananaysiempre · on May 15, 2023

> If NULL is a macro not defined as a pointer sized integer, then you're at fault here.

If it was you who wrote stdlib.h, sure; otherwise, if you’re on a platform where NULL is traditionally defined as 0 and not (void *)0, you’re stuck. A conformant implementation is free to use either definition.

If you want to language-lawyer more heavily, C does not require there to be pointer-sized integers (uintptr_t is optional), does not require that all zero bytes represent a null pointer in memory (unlike for integers), does not require that the implementation choose to store an integer with value zero as all zero bytes (there may be other valid representations), and in any case does not require an implementation to do anything reasonable at all if the caller passes an integer but a vararg callee looks for a pointer (think separate integer and pointer registers).

[I’m not entirely sure if (void *)(void *)0 is a null pointer constant (though it’s certainly an expression that evaluates to a null pointer)—does it count as a zero-valued integer constant expression cast to a pointer to void? So you might not even be able to use (void *)NULL as a hedge against bad platform headers.]

kazinator · on May 15, 2023

You're allowed to do:

  #undef NULL
  #define NULL ((void *) 0)

Just don't do it prior to the inclusion of any standard header (C or POSIX).

mananaysiempre · on May 15, 2023

I don’t think you are? Redefining a reserved identifier is UB per ISO C (any version) 7.1.3p2, and per 7.1.3p1,

> Each macro name in [the standard library] is reserved for use as specified if any of its associated headers is included; unless [you’re #undef’ing a function also provided as a macro].

The general idea seems to be that standard headers are allowed to use macros they define, even in other macros they define, and because macro names are late-bound (ugh), even if the user only redefines the name afterwards, every macro that uses it will then be affected.

As a silly example, a valid part of stdlib.h could be

  #define NULL 0
  #define EXIT_FAILURE (NULL)
  #define EXIT_SUCCESS (EXIT_FAILURE+1)

and now after your redefinitions EXIT_SUCCESS becomes a constraint violation.

(For an implementor to actually do this would of course be dumb, but you did say “allowed”, and that’s what the standard says here.)

Or did I misunderstand “use as specified”?

kazinator · on May 15, 2023

Do it where it works (hopefully some platform for which you build all the time), and get your enhanced diagnostics there; avoid it where there are problems like this.

bluGill · on May 15, 2023

Assuming 0 is an invalid value is not always correct. 0 is a perfectly valid pointer, and making it impossible to refer to that location is bad. Of course if you are not writing an OS or embedded system you won't ever have a pointer value of 0 anyway as the OS can put things elsewhere with no problem (if you are you need to see your CPU docs, some CPUs 0 is invalid, some it is not).

kazinator · on May 15, 2023

Umm, no. 0 is the null pointer constant, same as nullptr. It is not a location, but an abstraction. If a platforms null pointer happens to be the address 0xFFFFFFFF, then 0 will produce that.

There is no difference between

  char *p = nullptr;

and

  char *q = 0;

other than the variable name; the two have to compare equal: (p == q).

What's wrong with 0 is that when it's not in an expression where it's being converted to integer type, it's just an integer.

bluGill · on May 15, 2023

The problem is if 0 is a valid pointer and I write

volatile int* x=0; x=0x1234;

Did I just deference the null pointer or make a valid write to that memory location? There is no way to know for sure, you can only apply heuristics to make a guess.

Of course if the lines are that closely spaced you can guess, but in real code they can be in different translation units.

kazinator · on May 15, 2023

ISO C says that x isn't a valid pointer for dereferencing, so *x = 0x1234 is undefined behavior. If in your environment that pointer refers to some location where you can put a value, and this is documented, then you're using a documented extension.

If you're using some environment in which x isn't the zero address, but you do want the actual zero address, you need some other way to obtain it, like converting a non-constant expression:

  const int x = 0;
  char *zeroaddr = (char *) x;

C implementations are not required to diagnose null dereferences. In many familiar environments, its arranged by the way the program is loaded. An unmapped page of virtual memory is put at that address. That scheme can be defeated if an offset is involved ptr[large_offset] or ptr->member where member is at a large offset into the struct type.

It's possible to have run-time checks for a null pointer as a compiler option, with a run-time penalty. Other than that, you can use assertions to defend against them.

userbinator · on May 16, 2023

Your example declares one variable of type "pointer to int" and sets it to some arbitrary value. You didn't dereference anything.

bluGill · on May 16, 2023

The joys of programming in q comment box on a phone. .. too late to edit, so you just have to pretend i did dereference it

kazinator · on May 16, 2023

(Like I did.)

kevin_thibedeau · on May 15, 2023

0 never should have been overloaded in C to refer to the NULL pointer. With pointer assignment and comparison it transforms to the platform's encoding for NULL which isn't necessarily all zeros. No other literal has this sort of magic.

kazinator · on May 15, 2023

The C language existed before there was a preprocessor which made it possible to define NULL.

kevin_thibedeau · on May 15, 2023

This has nothing to do with the preprocessor. The concept of NULL existed before the macro was standardized. Literal zeros were the way to refer to it which was a design mistake.

kazinator · on May 15, 2023

I think it's a fine design. Obviously, the C++ people who invented

  virtual void fun() = 0;

were blind to whatever mistake it embodies.

When I gained the proper understanding that 0 is a null pointer constant, I immediatelly stopped using NULL. It's so much nicer.

- It works fine in a typeless language in which every value is a machine word, and integers and the null pointer share exactly the same representation.

- It works fine in a strongly typed language, where context indicates what zero means, even in the context of different types of different sizes and representations.

Unfortunately C (and also C++) contains some untyped areas, like variadic argument lists.

I would have designed it differently: only the token 0 would have the special overloaded meaning, and not all constant zero-valued expressions of any integer type. So (0 + 0) would not be a null pointer.

In unsafe contexts, the use of a 0 token would be diagnosed.

   printf("...", ... 0, ...); // diagnosed

   printf("...", ... 0L, ...); // not diagnosed: argument of long type

   printf("...", ... 0 + 0, ...); // not diagnosed: argument of int type

   printf("...", ... (void *) 0, ...); // not diagnosed: null pointer of (void *) type.

I think I would also not have the hexadecimal or octal zero tokens be the null pointer constant:

   char *p = 0; // null pointer, no diagnostic
   char *q = 0x0;  // error, integer to pointer with no cast

The downside is that the AST would have to retain that representation detail somewhere (possibly just in a single Boolean flag).

kevin_thibedeau · on May 16, 2023

This isn't just a C++-ism. The null pointer constants are only more prominent in C++ because of the rejection of void * even though it isn't any "safer" to have a special integer literal with the same semantics. Ultimately, it all comes from K&R C.

There are legitimate cases where you need a function pointer assigned to address zero (reset vectors commonly). The correct behavior in C is ambiguous if the null address isn't also 0 since the standard doesn't call out special behavior for function pointers. That wouldn't be the case if nullptr had been standardized earlier and there was no need for the magic 0 as a null pointer constant.

WalterBright · on May 15, 2023

This works in Standard C:

    int *p = 3;

LegionMammal978 · on May 15, 2023

That's a compiler extension. In C17, 6.5.16.1 (Simple assignment) implies that the RHS of an assignment to a pointer must either have pointer type or be a null pointer constant (i.e., an integer constant equal to 0, or such a constant casted to pointer type), and 6.7.9 (Initialization) states that "the same type constraints and conversions as for simple assignment apply" to expressions used as initializers.

Kamq · on May 16, 2023

"standard" C implies default to me, so the post is probably referring to C89 or C99, which are significantly more used than C17.

LegionMammal978 · on May 16, 2023

This particular rule is essentially unchanged from C89 (3.5.7 for initialization, 3.3.16.1 for simple assignment) and from C99 (6.7.8 for initialization, 6.5.16.1 for simple assignment). Generally, most existing rules don't change much at all across the C Standard versions.

ynik · on May 16, 2023

"Standard C" is the C that was standardized. This is not what you get from your compiler by default!

With gcc, you have to use `gcc -std=c99 -pedantic-errors`, otherwise it will accept programs which are invalid according to the standard.

kevin_thibedeau · on May 15, 2023

The address is still 3 which has valid applications. C is permissive enough to run on platforms that don't use address 0 for NULL. With pointer operations the compiler will change the encoding from 0 to that platform's NULL address.

  int *p = 0;
  intptr_t i = (intptr_t)p;
  
  if(i == 0) ... // Isn't always true

Gibbon1 · on May 15, 2023

I like to point out that AVR micro's reading and writing to address 0 is legit.

On the ARM Cortex I think address 0 is the initial stack pointer value.

My opinion is NULL being something special in the language is mathy CS academics trying to turn C into a mathy abstract language. Which it ain't.

jcelerier · on May 15, 2023

C is officially a mathy abstract language since 1989, the compilers just took some time to catch up.

> 2.1.2.3 Program execution

> The semantic descriptions in this Standard describe the behavior of an abstract machine [...]

Gibbon1 · on May 16, 2023

The mathy version is C is useless for real programs.

LegionMammal978 · on May 15, 2023

Well, the fault depends on who "you" are: the NULL macro generally comes from one's libc, and allegedly some libc maintainers have been very obstinately against changing their NULL macros to have pointer type.

throwway120385 · on May 15, 2023

Aren't there platforms where pointers have additional type or space information encoded that is orthogonal to the numeric address? It's only by convention that NULL == 0 because on platforms like Intel & ARM you would typically not use the first page. But that's only a convention, and you could just as easily put a null page at the top of your address space, especially in systems with an MMU where mappings can be added, removed, or remapped as-needed.

mananaysiempre · on May 15, 2023

> It's only by convention that NULL == 0 [...] and you could just as easily put a null page at the top of your address space [...].

Technically NULL == 0 always because the standard special-cases zero-valued integer constant expressions; (uintptr_t)NULL == 0 or NULL == *(void **)calloc(1, sizeof(void *)) is another matter :)

Language lawyering aside, a non-all-zeroes representation of NULL will probably blow up most C programs [e.g. static-storage-duration initialization is now not the same as calloc or memset(,0,) and is even type-specific]. Like CHAR_BIT, that’s a joint that technically exists but has been rusted for decades (pun not intended).

kazinator · on May 15, 2023

There is no problem with static initializations with a null pointer that is not all zero bits, or a floating-point 0.0 that is not all zero bits.

Those values just cannot participate in the "BSS" trick, whereby everything that is zero-initialized is put into a special section that doesn't actually exist in the program image, and is only provided on startup.

Those values would go into the initialized data section.

The problem with 0.0 or null pointers not being all zero bits is all the code that uses calloc or memset zero.

If this is on some specialized platform (e.g. DSP chip), it might not matter that vast quantities of C code are not portable.

In general, compiler (and to a great extent instruction set architecture!) designers are quite hamstrung by the expectations of C programmers and programs; that has been the situation for some thirty years now.

Today, you could not sucessfully introduce a system in which pointers to bytes (void , char ) have a different representation from other pointers (let alone different size, lord forbid).

pjmlp · on May 15, 2023

If we if ignore ongoing efforts on hardware memory tagging since SPARC ADI.

kazinator · on May 15, 2023

Hardware memory tagging schemes carefully avoid breaking common C idioms.

What breaks are

- truly dubious programs which use pointers in ways they really shouldn't, and are almost certainly just bugs; like use-after-free. When a pointer with an out-of-date tag is passed to the library, that is pretty much a confirmed use-after-free, which is not a legitimate idiom of non-maximally-portable C, like assuming that all pointers are the same size.

- C programs which manipulate the pointer representation: e.g. run-times for dynamic languages that put their own tags in pointers. These can easily work around tagging. (I've dealt with this on Android recently, quite easily. For the affected objects, I strip the tag away, and work with the untagged pointers. When it's time to pass the pointer back to Android's library, I put the tag back. All other code works as before.

pjmlp · on May 16, 2023

Yeah, but all of that falls off ISO C defined semantics.

mananaysiempre · on May 16, 2023

Incidentally, AFAICT C23 now allows stashing log2 alignof(max_align_t) bits in pointers in a portable (if awkward) manner: for example, for a char pointer p, the lowest such bit can be retrieved with memalignment(p)<2 and masked off with p-(memalignment(p)<2).

(C++’s std::aligned would be less awkward, but, well, memalignment() is what we get.)

You still can’t portably store pointers and integers (or doubles) in the same place and distinguish them, though.

LegionMammal978 · on May 15, 2023

In C, the integer 0 is explicitly defined to convert to a null pointer for all assignments, casts, comparisons, etc., regardless of what the pointer's "actual" value is. The only time where you can see that a null pointer doesn't have numeric value 0 is when you manipulate its object representation with memset, memcpy, etc. The compiler is also at liberty to return whatever it wants when you convert a null pointer to an integer, except that converting it back must produce a null pointer (if it's at least as wide as intptr_t).

josefx · on May 15, 2023

> The latter may crash if integer and pointer are different sizes.

Apparently some compilers specified a special __null extension to handle that case before nullptr was a thing.

kazinator · on May 15, 2023

Since NULL expands to an implementation-defined null pointer constant, it is valid for those implementations to go as far as #define NULL __null.

quesomaster9000 · on May 15, 2023

Nobody else is admiring typed enumerations?

Particularly when using structs this removes a lot of ambiguity if you ignore the indirection to find out the underlying type of the enum (or encode it in the name hungarian style).

    enum D : uint8_t {
        A = 0,
        B = 1,
        C = 2
    }
    typedef struct {
        D f;
    } __attribute__((packed)) E;
    assert(sizeof(E)==1);

etc. could make grokking protocol declarations with enums less onerous and requiring one less level of indirection.

maccard · on May 15, 2023

I'm a c++ programmer and finding it hard to be excited about things we added to the language 12 years ago.

loeg · on May 15, 2023

As a sneering C++ programmer, why are you even reading / commenting on a new C standard? This is basically a "if you don't have anything nice to say, don't say it" situation.

maccard · on May 16, 2023

Honestly, because there's very little c++ content here on HN and a relatively large amount of C content. Most of the C content is full of people saying "we don't need X from C++" but the reality is most of these things have significant uses

loeg · on May 16, 2023

Neither of those statements really matches my experience with HN (high C to C++ content ratio, lots of comments rejecting advances first added to C++). I totally agree some of these things are very useful and I'm glad to see them formalized in C (even years later than C++).

tom_ · on May 15, 2023

The same goes for C programmers with a chip on their shoulder!

There is a downvote button.

paulddraper · on May 16, 2023

I wouldn't speak too loudly on that high horse.

The HN Rust Gang will find you.

maccard · on May 16, 2023

They're too busy looking for their vowels to reply here.

Joking aside, I've a healthy amount of respect for rust, and I hope that many of the ideas make their way into other language. The terseness, heavy use of macros, _insane_ compile times (and that's coming from someone who writes templates in c++), general assumption that you're on Linux from third party crates, and IDE support combine for something that just isn't usable for me just yet. Maybe in a few years!

quesomaster9000 · on May 15, 2023

As a D programmer, why haven't you caught up yet?

maccard · on May 15, 2023

I write a reasonable amount of kotlin these days and it's night and day.

ishvanl · on May 15, 2023

As a rust programmer... etc etc.

david2ndaccount · on May 15, 2023

Clang has had it as an extension for a long time

jwilk · on May 15, 2023

twic · on May 15, 2023

Who is the audience for new features in C? And who is driving stuff through the standardisation process? Is this stuff likely to make its way through to embedded toolchains? Or is this for people who are maintaining existing codebases?

nitrix · on May 15, 2023

Changes to the Standard usually happens as a result of defect reports (confusing details that implementation writers want clarity on) or vast enough general adoption (unifying how implementations were differently achieving the same thing).

You can read #13 of the Charter https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2086.htm

As for the audience, it's all the C developers, the open-source and commercial compiler implementations, vendors of libraries, tooling, services, learning material and everything else built in C; which is just innumerable.

Each Standard version released supersedes and obsoletes the previous versions. Intentionally, the versions are meant to be as backwards compatible as possible so that one can mix and match C89/C99/C11 codebases with minimum effort.

C has gained only a handful of features in the last 40 years. Compared to the great many things that are improved w.r.t. undefined/implementation-specific/unspecified behaviors, or removed to keep up with modern times (e.g. Trigraphs, Two's Complement integer representations, etc).

I'd say: (1) upgrading is not the spooky thing people make it out to be. Go, Rust, they all move much faster than this and have very ambitious big design ideas on their mind. (2) It's necessary to take good care of C as it, and the things built in it, will realistically outlive many of us.

hgs3 · on May 15, 2023

> Who is the audience for new features in C?

Folks like myself who use C to write system software.

> Is this stuff likely to make its way through to embedded toolchains?

Embedded toolchains based on GCC or Clang will presumably see these features one day.

quesomaster9000 · on May 15, 2023

The early adopters are usually transpilers (or code generators) which can quickly take advantage of new features without the effort of rewriting an entire codebase.

In the same way that Rust used underlying `const` attributes in LLVM (and found all the weird edge cases), and Nim used C as an intermediate as have many other lisp or object-ish languages.

nrclark · on May 15, 2023

Yes, I'd expect they will. Most embedded toolchains these days are built around GCC. So as GCC grows new features, embedded toolchains will get them too.

pantalaimon · on May 15, 2023

Most embedded toolchains are ARM or RISC-V GCC these days, they get all the features.

cozzyd · on May 15, 2023

There is plenty of new C code.

xenadu02 · on May 15, 2023

Officially adopting __auto_type as auto is good. Unfortunate that N2953 dropped function return type and parameters.

This feature composes with _Generic macros quite well:

  #define div(X,Y) _Generic((X)+(Y), int: div, long: ldiv, long long: lldiv) ((X), (Y))
  
  auto res = div(38484848448, 448484844); 
  auto a = b * res.quot + res.rem;

It also lets you get rid of all the "typeof(X)" foo in macro definitions.

cryptonector · on May 15, 2023

> This proposal also recommends adoption of Unicode normalization form C (NFC) for identifiers to ensure that when compared, identifiers intended to be the same will compare as equal. Legacy encodings are generally naturally in NFC when converted to Unicode. Most tools will, by default, produce NFC tex.

Er, a much better approach is to allow unnormalized Unicode in source code and use form-insensitive matching of symbol names so that all forms of a symbol are equivalent. This can be done by normalizing during the parse, or by implementing form-insensitive string comparison and hashing functions that normalize glyph by glyph as needed -- the latter can be very fast for all-ASCII and mostly-ASCII symbols!

The reason this is a better way is that there's too many places that don't produce NFC. For example, HFS+ uses NFD, so if you cut-n-paste a file name from HFS+ into other contexts, you'll be pasting NFD unless the cut-n-paste system normalizes to NFC. Also, while it's true that input modes typically produce NFC, it's more that they produce NFC for a small subset of Unicode, not that they will normalize other forms seen on input. Using form-insensitive string comparison/hashing/matching yields a better user experience at not that much implementation cost: you're gonna need a Unicode library, and that library will need to have normalization support, so you can implement form-insensitivity.

wahern · on May 15, 2023

> Er, a much better approach is to allow unnormalized Unicode in source code and use form-insensitive matching of symbol names so that all forms of a symbol are equivalent.

Linkers will often be blissfully unaware of Unicode or any form of localization. This was the impetus for UTF-8, so that the bulk of software which is 8-bit clean or which operates on opaque, NUL-terminated strings can continue working as-is. This can't be changed without breaking backwards ABI compatibility; therefore, it's very unlikely to change.

There are countless half-measures that could be taken, but few if any are suitable for standardization. If the history of software localization is any guide, in the face of strict, forward-looking specifications various vendors and ecosystems will likely go there own way, with the one sure thing being a failure to fully adopt or properly implement the specification.

cryptonector · on May 15, 2023

Yes, the compiler should normalize symbols before writing object files, no doubt. I'm talking about the inputs though -the source files- which should not have to be normalized.

froh · on May 16, 2023

that means the compiler needs to have the Unicode database.

I'd rather have a seperate 'recode' to NFC preprocessing in the build process, if the developers editor doesn't do NFC normalization.

if I'm not mistaken checking for NFC is simpler than transforming to NFC, right?

rurban · on May 16, 2023

Only a tiny part of the Unicode database, the normalization tables. Problem is these tables have to be updated every year, and they dont with similar tables, such as the glibc.

And Unicode identifiers are entirely insecure, because they are not identifiable. My proposal was postponed to C26.

cryptonector · on May 16, 2023

> Problem is these tables have to be updated every year, and they dont with similar tables, such as the glibc.

The language standard can avoid this by committing to a single Unicode version for each language version.

> And Unicode identifiers are entirely insecure, ...

Eh, it depends on what we're talking about using them for. If it's symbols in object files, it's not really a problem. More importantly confusables are unavoidable -- even ASCII all by itself has confusable characters (1 and l for example). The thing to do is to forbid arbitrary mixing of scripts, allowing only those scripts that are relevant in the relevant contexts. For DNS this means that DNS TLD registries should come up with per-registry rules, for example.

rurban · on May 17, 2023

See my https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2932.htm

cryptonector · on May 17, 2023

Can you expand on what the security concerns are as to confusables in symbol names in C source code? Clearly there's a security concern with cut-n-paste, but that's true regardless of what the rules might be for C identifiers.

It's not like it's obvious that UTR#39 applies literally everywhere that there are "identifiers".

Also, can you speak to what is the security concern with form-insensitivity (rather than confusables) as to symbols in input source files? I just don't see a concern at all there, but maybe I'm missing something.

Lastly, I think `#include` is the most important place to get this right since that does interface with the world outside the compiler (specifically: the filesystem), but as you note the filesystems mostly are just-use-8bit -- very few filesystems normalize on create (HFS+) or are form-insensitive (ZFS). The other place to get this right is on the object file output side, where symbols definitely must be normalized.

Oh, one more thing: the platform might impose some rules regarding symbols in ELF and any other object file formats. Are they known to? I suppose C can't necessarily cater to all platform-imposed limitations on symbol naming, but it'd be useful to know about them.

cryptonector · on May 16, 2023

The compiler is going to need a Unicode library anyways.

It's true that checking that some string is in some canonical form (say, NFC) is easier than normalizing it. In fact, it's even easier to check that some string is in NFD than it is to check that it's in NFC because NFC is defined in terms of NFD. That's not enough to justify not doing form-insensitive string matching in the compiler and forcing a pre-compilation normalization step.

Also, you'd only want symbols normalized, not string literals (since one might need a string literal that is not in a canonical form, or not in NFC). Thus the pre-compilation normalization step would have to be language-aware, so it might as well be part of the compiler and not a separate step.

froh · on May 16, 2023

> you'd only want symbols normalized, not string literals

_this_ is the argument that convinces me.

thank you:-)

cryptonector · on May 16, 2023

You're welcome :)

But I'd like to convince you and everyone that in general we want to accept inputs in any form and be form-insensitive because that's much more user-friendly.

We don't in fact have universal NFC-only input modes. We have accidentally NFC-mostly input modes -- accidental because typically we have transcoding from legacy codesets, which yields NFC, but no normalization is actually done so that copy-paste operations can cause non-canonical input to be provided to applications. We don't have universal agreement on NFC. This makes for a mess with occasional user-visible problems.

Form-insensitivity is essentially the same as normalizing character-by-character when hashing or comparing strings, but this can be very fast for mostly-ASCII text, so form-insensitivity can be very fast.

For string comparison form-insensitivity is faster than first normalizing because either the strings are equal in content and probably equal in form, or they differ early and possibly at some character where normalization is not necessary to determine the result of the comparison, and thus less work need be done to compare strings form-insensitively than first normalizing all inputs (unless so many string comparisons will be done that maybe normalizing first might be a win). For string hashing form-insensitivity is not faster because one has to normalize every input, whereas if all inputs are normalized once then one need never normalize to hash, but still form-insensitivity yields a better user experience.

typon · on May 15, 2023

Two uncontroversially useful C++ features (nullptr, typed enums) entering C is a great thing.

I really wish C++ had Rust style enums. Having to implement them with flags in C++ is a bit of a chore.

xenadu02 · on May 16, 2023

For those who are wondering about lambdas and defer those were tabled to the next standard revision but not rejected.

Lambda: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2892.pdf Defer: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2895.htm

torrent · on May 18, 2023

Sad perspective for C programmers who need to wait 10+ years until they get half-backen versions of some useful C++ features.

anonymousDan · on May 15, 2023

Are there any good resources on writing compiler optimization/instrumentation passes in gcc (as opposed to LLVM)?

zabzonk · on May 15, 2023

nullptr is since c++11

https://en.cppreference.com/w/cpp/language/nullptr

sorry - thought the post was re c++ - my bad

mananaysiempre · on May 15, 2023

In C++. C only copied it in C23 (not yet ratified).

zabzonk · on May 15, 2023

sorry, misread the post

whatever1 · on May 15, 2023

I wonder whether LLMs could help with smarter code optimizations, especially, since they can be context aware.

Gibbon1 · on May 15, 2023

We should have a ten year moratorium on optimizations and force the compiler maintainers to work on other things.