Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stop using DICT dictionary apps (such as GNOME/MATE Dictionary) (ctrl.blog)
54 points by mritzmann on Aug 13, 2022 | hide | past | favorite | 61 comments


>For example, it is inadvisable to look up information about abortion from within some U.S. states, war crime in Russia, or democracy and human rights in China.

This seems to be an exaggeration. I live in Russia and just the word itself, "war crime", when taken alone out of conext, is safe and not punished for. Maybe I'm reading an article about American war crimes in Vietnam, who knows. I would never think it's dangerous to say this word. Unless it's a public statement, it's pretty safe. I don't know about China, but I doubt it's a huge problem in the US as well.


Maybe you don't get punished but maybe you get on an agency watch list and they check your other web interests


That's getting wildly speculative. A government potentially flagging a behavior that it's potentially monitoring, and potentially adding it to a potential watchlist for it?


Happened once just because of the wrong search terms

https://www.theverge.com/2013/8/1/4580654/michele-catalano-g...


Same thing in the US. Searching for "abortion" is not illegal. Anyone who thinks so is paranoid.


It is not paranoia when someone is actually out to get you.


Searching google for "Abortion" is not going to have someone after you.


Having a miscarriage after you googled the word abortion will turn a tragic situation into an even worse one.


Returning to the topic of the article, how does DICT - a network protocol - makes a difference here? Are there any states where traffic gets recorded or queried for keywords? That seems like a huge privacy violation and would be a more interesting article.


This seems like pointless scaremongering to me. Nobody's getting thrown into a re-education for looking up a word in a dictionary. This kind of cartoonish exaggeration of how totalitarian regimes operate only muddies the water on the actual harm done by such governments.


All else being equal, more privacy is always good. I think the examples given are just there to get normal people to care about this too.

Similar to when abortion was outlawed in the US and suddenly people cared a lot about all that Google Maps location history that Google was saving. It's much easier to convince people to mitigate a real problem than it is to prevent a hypothetical future problem. So convince them to mitigate this "real" problem that governments can see your dictionary queries, and hopefully that might prevent an actually real future problem.


> This kind of cartoonish exaggeration of how totalitarian regimes operate only muddies the water on the actual harm done by such governments.

What do think you know about "how totalitarian regimes operate"? Worked for some have you?

What we do know, from well sourced documents, is that real-time keyscore type filtering and flagging of every byte of unencrypted data occurs at ISPs, and that is common practice in our own non-totalitarian nations. What do you imagine goes on in less savoury dictatorships to whom we sell the same intercept equipment and DPI firewalls?

To describe that landscape as "cartoonish exaggeration" seems either naive or disingenuous.


XKeyscore is/was used to fight terrorism and it's only available to federal agencies, not local law enforcement. Abortion isn't prohibited on a federal level and it's not a national security concern. It doesn't explain why people should avoid using GNOME Dictionary at all.


> Nobody's getting thrown into a re-education for looking up a word in a dictionary.

Tell me you've never been in that situation without telling me you've never been in that situation. Check your privilege.

I was hauled in front of the police, and they went through my search history. I'm innocent and still thought I was going to get the chair.

Every little search is taken out of context and the worst intentions are assumed.

I honestly can't stand this kind of flippant attitude by people who've never experienced anything remotely like this. And now it's even worse for say American women looking for an abortion etc.


Not because of using a dictionary but it could be used a one part of your profile.

Or it's used as a trigger to engage in further observation of your web activities


> This seems like pointless scaremongering to me.

One man's infosec advisory on attack vectors is another man's scaremongering. Hand waving around a problem does nothing to address it.


Your view might be less sanguine if you were, say, a woman in Texas needing an abortion.


[flagged]


> Are you a Texas woman looking for abortion

No, I'm not (and I wouldn't tell you if I were!) I don't need to be directly affected to have an opinion. "Either you are directly affected, or you're a virtue-signalling ghoul" is a false dilemma, and I say your argument is dishonest.


Not any less dishonest as your unsupported scaremongering. You literally did what you're accusing me of doing.


Even searching the "wrong" things on google can get you in trouble

"How Google searches for 'pressure cookers' and 'backpacks' led the cops to a writer's door"

https://www.theverge.com/2013/8/1/4580654/michele-catalano-g...


Maybe not this specifically, but people have been and are being thrown into reeducation camps for trivial reasons. https://www.shahit.biz/eng/


People are judged for looking up terms related to their accused crimes all the time.


Apparently authorities reverse keyword search Google search histories to find suspects.

June 30th: https://www.nbcnews.com/news/us-news/police-google-reverse-k...

> In documents filed Thursday in Denver District Court, lawyers for the 17-year-old argue that the police violated the Constitution when they got a judge to order Google to check its vast database of internet searches for users who typed in the address of a home before it was set ablaze on Aug. 5, 2020.

Discussion: https://news.ycombinator.com/item?id=31938350


I agree it's unlikely anyone will ever care but you've got to admit it's insane and surprising that a dictionary app in 2022 looks up the results online using a gopher-era plaintext protocol rather than just bundling the dictionary.


Indeed, such levels of paranoia cannot be healthy. If you're worried that somebody might find out that you looked up a word in a dictionary, imagine what they could find if they started snooping into your trash.


Why doesn't the author tell people to just got to https://dict.org in a browser, which is the default backend for the apps? SSL, POST queries, minimal and fast web output. Only javascript on the site is an old useless widget from a decade ago that bounced you to the internet strike of 2012. Works well even in elinks / links in a terminal if that's your thing, could probably whip up a cURL alias in minutes.


A few minutes in Firefox' Network logger reveals that https://dict.org does not expose the search term in the URL, unlike all of:

  https://ahdictionary.com
  https://www.collinsdictionary.com
  https://www.merriam-webster.com
  https://en.wiktionary.org
Furthermore, the last three, excepting ahdictionary.com, issue HTTPS requests per-keystroke -- needed for auto-complete.

Note however one tripping hazard with dict.org: Unencrypted port 80 i.e. "http://dict.org" is functional, does not redirect to "https://dict.org", and Firefox responds to a bare "dict.org" in its search box by first trying "http://dict.org". FF presents the dict.org home page upon getting the port 80 success response.


Why do we care about exposing search term in url on an HTTPS connection?

Regardless, wiktionary supports POST for search, its just not the default.


I went down a bit of a rabbit hole and noticed that wiktionary.org has an API, for example:

https://www.mediawiki.org/wiki/API:Main_page


A bit off-topic, but I feel like Wiktionary is a hidden champion on the internet. Some time ago I noticed that I use Wiktionary more often than Wikipedia these days. I am not an expert in the space, but I get the impression that it's content (especially the multi-language aspect) outperforms any other dictionary in existence. It's an absolute treasure, without many of the problems and conflicts an encyclopedia cannot really circumnavigate.


In my usage, Collins dictionary is the best for explaining and giving background to English words. Wiktionary can't compete with that. I use it with an ad blocker and read all their articles for a given word.

IME their articles are written with learners in mind, too. I would probably count as a proficient English user that is still learning new words (well, who does not).

Arbitrary word for example: https://www.collinsdictionary.com/dictionary/english/gremlin


Wiktionary is great. I use it many times every day - for languages I'm learning, but also for learning more about my mother tongue. I've never been able to figure out the mediawiki API but had success just scraping/parsing wiktionary HTML files. For individual languages there are normally standard templates which structure in a nice machine-readable way (much of this can I think in principle be accessed directly with an API but it's in my experience hard to discover how to do any given task with the API vs having the html right there in front of your nose begging out to be parsed).


I've generally found that things like the part of speech type is inconsistent, so the wiktionary files require a lot of work to extract the information from them if you want to use the data for natural language processing tasks.

I've also found that things like thesaurus/synonym information is incomplete.


I don’t find the definitions to be as good as e.g. American Heritage, though.


The Mediawiki API is pretty powerful.

The annoying thing about Wiktionary (for this purpose) is that it's explicitly not machine readable, so the entries can only really be handled as free-form HTML. Parsing the Wikicode, despite heavy use of standard templates, would be very hard to do robustly.


Once upon a time i had a project to do that. and it was miserable (but also it was when i was first learning to program... so i did a lot of stupid things)


Wikidata has lexemes now, which, like the rest of Wikidata, has great potential that's largely ignored both in general and by the parent organisation.


Yeah, we'll see how the lexeme thing goes. There is also the earlier (non-wmf) omegawiki project that tried to do something similar.


You might find this other api more suitable: https://en.wiktionary.org/api/rest_v1/#/Page%20content/get_p...


Much better than the API I found. Thanks.


Use Wordnet [1,2].

It's an entirely local database with decent etymology and provisions for synonyms, homonyms, semantic relations and so on.

> $ /usr/bin/wn hacker -over

  Overview of noun hacker

  The noun hacker has 4 senses (first 1 from tagged texts)
                                         
  1. (1) hacker -- (someone who plays golf poorly)

  2. hacker, cyber-terrorist, cyberpunk -- (a programmer who breaks
  into computer systems

[1] https://en.wikipedia.org/wiki/WordNet

[2] https://wordnet.princeton.edu/download/current-version


Interesting that the web site differs:

    S: (n) hacker (someone who plays golf poorly)
    S: (n) hacker (a programmer for whom computing is its own reward; may enjoy the challenge of breaking into other computers but does no harm) "true hackers subscribe to a code of ethics and look down upon crackers"
    S: (n) hack, drudge, hacker (one who works hard at boring tasks) 
http://wordnetweb.princeton.edu/perl/webwn?s=hacker


The article notes

> The WordNet Project is retired, so the dictionary will grow more out of date over time.


... as opposed to the dictionary I have on my bookshelf.


As opposed to other online dictionaries that are still updated.


I find the older definition preferable.


One of the quality-of-life things I really miss about MacOS when using KDE is the dictionary quality, and in particular the ability of Apple to include random technical words (aplanatic?), recognise foreign words being used out of context (I like Dansk øllen) and silently correct things like German obscure capitalisation rules.

Unfortunately, aspell / hunspell just aren't in the same ballpark – and not just because of the lack of words. I think the dictionaries are much smaller and it's harder to set up this kind of weird, but very useful, behaviour.


I also really like how easily available it is; in any app (except games), just put the cursor over any word and press harder than normal on the trackpad and a bubble with good definitions appears over the word. I don't think any system comes close in terms of dictionary ergonomics, for whatever that's worth.


There is also a difference between a dictionary used for spell checking and a dictionary for definitions: smaller (within reason) is better for the former, while larger is better for the latter.

(It is better to add an uncommon word to the user dictionary than it is to have a large number of uncommon words in the system dictionary since it is better to alert the user to a potentially misused word than to leave a misspelled word in place. Of course, this doesn't help in every case - as is evident with the common auto-correct snafus.)


> "So, why is this a problem for dictionary lookups?", you might ask. Some knowledge is forbidden knowledge, depending on your local authorities. For example, it is inadvisable to look up information about "abortion" from within some U.S. states, "war crime" in Russia, or "democracy" and "human rights" in China.

Ill-advisable to look up linguistic information? Has there been a single precedent that would justify such claim?


That's a hard to proof issue.

There obviously aren't any official statements from each state and whistle blowers are easily silenced if the media is controlled like it is in China for example.

Statements by other nations would have to be taken with a fuckton of salt as that could just be propaganda too.

I don't think it will be possible to be certain about this, ever.


A basic windows 10 install takes up 32GB of disk space.

Funny that Linux distributions aren't just shipping a local dictionary to protect their users privacy.


MacOS is the only OS that ships with a dictionary. Because Apple pays the dictionary publisher for the rights to it. Who’ll flip the bill for the Linux dictionary?


Have you looked in /usr/share/dict on any Linux desktops lately?


Wiktionary is Free; dict.org that they're using now uses a number of freely redistributable sources: GCIDE, WordNet, Moby Thesaurus II, etc.


No, stop using them without installing the daemon and desired dictionaries. Your distro should provide them as first-class packages, Debian has for as long as I can remember.

It's fantastic having instantaneous `dict` queries at the CLI, regardless of my connectivity status. The enhanced privacy is just a bonus over this already very real benefit.


>For example, it is inadvisable to look up information about abortion from within some U.S. states, war crime in Russia, or democracy and human rights in China.

Not sure about Russia or China, but it is not illegal to look at the definition or spelling of abortion in the US.


It's true, but protocol is not very relevant. Because, you know, while Goolag Translate uses SSL/TLS, it still collects your queries. Being logged in makes things even worse. So, the only solution is to have an offline dictionary.*

* that does not send out "usage statistics", and, if it's paper book, make sure to not to leave fingerprints or indication of usage :)


Why does the DICT protocol even exist anymore? I don’t see any reason it should be used versus a normal HTTP GET or POST.


If you need dictionary lookups, I suggest Goldendict with offline ones.


I wonder, how many users are still using "Dict" apps?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: