Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This paragraph is the key:

> The QUERY method provides a solution that spans the gap between the use of GET and POST. As with POST, the input to the query operation is passed along within the content of the request rather than as part of the request URI. Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.



But doesn't this beg the question: why not GET with query params?

I'm not necessarily against it - I've had the urge to send a body on a GET request before (cant recall or justify the use-case, however).

Reasons I can think of:

- browsers limit url (and therefore) query param size

- general dev ergonomics

The practical limits (ie, not standard specified limits) on query param size seems fair but its worth mentioning there are practical limits to body size as well introduced by web servers, proxies, cloud api gateways, etc or ultimately hardware. From what I can see this is something like 2kb for query params in the lower end case (Chrome's 2083 char limit) and 10-100mb (but highly variable and potentially only hardware bound) for body size.

In both cases it's worth stating that the spec is being informed by practical realities outside of the spec. In terms of the spec itself, there is no limit to either body size or query param length. How much should the spec be determined by particular implementation details of browsers, cloud services, etc.?


With a request body you can also rely on all the other standard semantics that request bodies afford, such as different content-types, content encoding. Query parameters are also often considered less secure for things like PII and (less important these days) query parameters don't really define character encoding.

But generally the most important reason is that you can take advantage of the full range of mimetypes, and yes practicality speaking there's a limit on how much you should stuff in a query parameter.

This has resulted in tons of protocols using POST, such as GraphQL. This is a nice middle ground.


Servers have to limit header size as well, since content-length isn't available. Without a bounded header size the client can just send indefinitely.

Also just impossible to read, and url encoding is full of parser alignment issues. I have to imagine that QUERY would support a json request body.


Size can be a very practical concern if your query parameter is e.g. a picture.

Not showing the URL in various logs can be a concern if your query parameters are sensitive.


A picture as an input for a GET seems like a very strange use-case. Can you elaborate?

I do agree on the quirkiness of encoding a bunch of data in the URL. Its nice to decouple the endpoint from the input at a more fundamental level.


Use cases like reverse image search come to mind


Reasons:

  - URI length limits (not just
    browsers, but also things
    like Jersey)
  
  - q-params are ugly and limiting
    by comparison to an arbitrarily
    large and complex request body
    with a MIME type
However, q-params are super convenient because the query is then part of the URI, which you can then cut-n-paste around. Of course, a convenient middle of the road is to use a QUERY which returns not just query results but also a URI with the query expressed as q-params so you can cut-n-paste it.


Query params are also extremely commonly logged all over the place and so putting sensitive information in them is almost always a bad idea.


Of course. The security considerations section does mention this as a feature of `QUERY`, saying that the `Location:` returned by the server should not encode all the details of the request.

However it's also true that q-params effectively form part of the UI. I'm certain you've edited URIs before -- I have, and I know not-so-knowledgeable people who do it too.

Striking a balance here is not easy. With `QUERY` the server can decide how much of the query to encode into the `Location:`, if any of it at all. The server might use knowledge of the "schema" that the query refers to, or it might use the syntax of the query (if it supports indicating sensitive portions), or it might only "link-shorten" the whole query.


This avoid changing the definition of GET. Who knows how many middle boxes would mess with things if you did that because they “know” what GET means and so their thing is “safe”.

Until GET changes.

People aren’t using QUERY yet so that problem doesn’t exist.


> Unlike POST, however, the method is explicitly safe and idempotent, allowing functions like caching and automatic retries to operate.

A shitload of answers to GET requests, although cacheable, are stale though. If you issue a GET for a page which contains stuff like "Number of views: xxx" or "Number of users logged in: yyy" or "Last updated on: yyyy-mm-dd" there goes idempotency.

Some GET requests are actually idempotent ("Give me the value of AAPL at close on 2021-07-21") but many aren't.

Stale data won't break much but there's a world between "it's an idempotent call" and "I'm actually likely to be seeing stale data if I'm using a cached value".

I mean... Enter in you browser "https://example.org", that's a GET. And that's definitely not idempotent for most sites we're all using on a daily basis.


Idempotence does not mean immutability, it means that two or more identical operations have the same effect on the resource as a single one. Since GET operations, by virtue of also being safe, generally have no effect at all, this is almost always trivially true. Just because the resource's content changed for some other reason doesn't mean GET is not idempotent.


Correct.

GET, DELETE, HEAD, OPTIONS, PUT are idempotent.

POST is not. (Thus existence of Idempotency-Key, etc)


Idempotency in the context of HTTP requests isn't about the response you receive, but about the state of the resource on the server. You're supposed to be able to call GET /{resource}/{id} any amount of times without the resource {id} being different at the end. You can't do the same for POST, as POST /{resource} could create a new entity with every call.

A view counter also doesn't break this, as the view counter isn't the resource you're interacting with. As long as you're not modifying the resource on a `GET` call, the request is idempotent.


Strictly speaking, an inline view counter, which is incremented on each GET and included in the response, would break idempotence. Similarly, a PUT request that implicitly modifies a "last updated" field would also break idempotence. These are pretty minor violations, though, which arguably don't change any "semantics" of the response.


If the view counter is part of the resource itself, then yes, incrementing it on GET breaks the idempotence contract - but it should be obvious that by breaking idempotence, you're breaking idempotence. If it's part of the response without being part of the resource, you're not breaking idempotence - otherwise things like updating a JS library would also break idempotence, but it doesn't, since it's not part of the resource.


Updating a JS library has nothing to do with idempotence. The JS libraries used in <script> tags by HTML pages are absolutely part of the resource. But they aren't changed by GET requests, they're (usually) changed by the site admin out-of-band. Idempotence is not immutability; the content of a resource can change between two identical GET requests, it just can't change because of those requests.


> Updating a JS library has nothing to do with idempotence. The JS libraries used in <script> tags by HTML pages are absolutely part of the resource.

No, they absolutely aren't, and this is a really important distinction. Not everything contained in the response is part of the resource.

The resource is essentially the data living on the server. It doesn't matter what the response looks like or how it's formatted - as long as the same data is transferred, you're referring to the same resource (e.g. `/{resource}/{id}.html` and `/{resource}/{id}.json` can be different representations of the same resource). This means that changing ancillary response data, i.e. non-resource response data, doesn't change the resource, because it's not part of the resource.

If you were correct, the resource would have to contain the JS libraries used. They would have to be part of the data model. Have you ever seen an application like that? Where all JS libraries, CSS files and so on are duplicated into every single resource, and updating the files means updating every single resource entry in your database? Where a JSON API also serves all the JS libraries used in the HTML representation of the resource? And mind you, we're not talking about a "HTML page builder" or something, but about any CRUD application. Usually these things live outside of the resources, in templates or similar.


Ok, I think we're talking past each other a bit.

If on index.html, the script src is "example-1.2.3.js" and you change it to "example-1.2.4.js" then yes, you have changed the resource identified by index.html.

If instead you have a script src of "example.js" and you simply change the content served at example.js, this does not change the resource index.html.

This rarely has anything to do with method idempotence, because these changes are usually made out-of-band (classically, by uploading new files to an FTP server).


No, this is simply not true. If you change "example-1.2.3.js" to "example-1.2.4.js" in the response to `/{resource}/{id}`, it does not change the underlying resource. The included script is simply a part of the representation of the resource, but it's not the resource itself. The representation of the resource can change without the resource itself changing.

https://httpwg.org/specs/rfc9110.html#rfc.section.3.2

You can change "index.html" as much as you want to, as long as the resource it represents stays the same.

"index.html" is not a resource, it's a representation of a resource. If you still disagree, please explain how "index.html" and "index.json" can be representations of the same resource.


They are not the same resource. Every resource is uniquely identified by its URI (normalized, ignoring the fragment), and so https://example.com/index.html and https://example.com/index.json are therefore not the same resource.

Perhaps what you are thinking of is content negotiation. I can request https://example.com/index.html and (despite the name) get JSON back, either because the server is cheeky or because I said "Accept: application/json" or similarly expressed a preference for JSON over HTML. Assuming both JSON and HTML exist at the same URI, these would be two representations of the same resource (accessed from the same URI, but with different headers). Broadly speaking, however, this mechanism is not often used outside of certain specialized protocols, and so it generally doesn't matter: changing the HTML bytes of index.html changes the resource because HTML is the only representation. Per the contract that specifies method idempotence and safety, GET index.html should therefore never cause the HTML content I receive back to change. However, the HTML content can change between requests for other reasons.

Whether changing only one representation of a resource is the same as changing the "whole" resource somehow is a bit moot, because the specific representation is what's consumed by the client and stored by caches. The ETag and cache parameters are tied to the specific representation as well, and can't be "smeared" across all representations. Practically speaking, the resource is inseparable from its representation, and thus two different representations are treated by well behaved HTTP clients and caches the same as two different resources, even if a higher-level protocol (SOAP, WebDAV, etc.) might treat them as semantically equivalent.

This is my read of RFCs 9110 and 9111; if you have strong evidence otherwise, I'm open to it.


I suppose the counterargument is that "users logged in" or "views" is stale all the time, because it could have changed while the response was in flight.

If you really need live data for "views" or what have you, then perhaps the front end should be querying the backend repeatedly via a separate POST.


Yes. I was skeptical at first. Why add another method? This explains why.


GraphQL reads are like that. You POST a query with the content inside the request body. This would be a nicer verb for it.


OData provides GraphQL functionality but using GET + URL parameters, for large queries it's really hard to read or edit. Using the request body makes sense for readability, so OData also offers the option to query with POST + request body.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: