Google Open Source Code Search

_pxkn · on March 12, 2020

Seems like this is based on Google's own internal code search tooling, something most engineers at Google rely on for every day code-level work. I personally can't even begin to imagine how I'd navigate the gigantic codebase without it.

(I work at Google)

wrsh07 · on March 13, 2020

I had this same fear as I left Google, but it turns out that a great ide (eg anything by jetbrains) can take you quite far.

It's a different work flow, but you simply don't need cs/ when your code base is orders of magnitude smaller

arunaugustine · on March 12, 2020

Would you know if the Google cloud product for hosting git projects [1] uses the same underlying code search as the internal tool?

[1] https://cloud.google.com/source-repositories

hanwenn · on March 12, 2020

It is the same tool.

snazz · on March 12, 2020

It’s also used for https://source.chromium.org. I now host my monorepo on Cloud Source Repositories because it has a super nice integration with the rest of their products.

joatmon-snoo · on March 12, 2020

If you crack open developer tools and watch the API requests go by, you'll be able to confirm that it's the same thing :)

cromwellian · on March 12, 2020

Not really, this is a pale shadow of what the real CodeSearch inside Google does. I really wish the external ones had even 1/10th the functionality.

coderdd · on March 12, 2020

Help https://github.com/TreeTide/underhood reach feature parity ;)

zxienin · on March 17, 2020

what's 9/10th that internal one has?

microdrum · on March 12, 2020

You'd use Sourcegraph, probably.

mistrial9 · on March 12, 2020

what is the constant phone-home activity on that opaque container they send as SourceGraph.. It is occassionally the case that devs have too-fast machines, so their code isn't seen on ordinary equipment. With SourceGraph and other inner-network-devs tools, the amount of chatty traffic and build dependancies seems seriously off-putting, trending to useless with ordinary network.

voz_ · on March 12, 2020

This seems like a bad attitude. Perhaps you could constructively ask for a sourcegraph-lite that does less, in return for less deps / networking complexity?

emidoots · on March 12, 2020

I am a dev at Sourcegraph, I'd be very open to any feedback.

You can firewall off Sourcegraph 100% for complete confidence, and aside from the first admin's email address (so we can notify them of any security updates) we only send back aggregated anonymous usage statistics which we are extremely transparent about: https://docs.sourcegraph.com/admin/pings

We sell developer tools, not user data.

jsjddbbwj · on March 12, 2020

Maybe the problem is that to disable telemetry you have to contact support and pray they like you, instead of giving a configuration option.

emidoots · on March 12, 2020

Ah, yeah, that's fair. I'll forward this feedback onto the team.

The option is documented in our config docs, though, and also appears in the config editor's autocomplete in the app if you type `telemetry`, though, so it's not really a secret https://docs.sourcegraph.com/admin/config/site_config#disabl...

prepend · on March 12, 2020

The property is “disableNonCriticalTelemetry“ [0] that seems curiously named to me.

What kind of telemetry is critical? How do I disable that?

[0] https://docs.sourcegraph.com/admin/config/site_config#disabl...

emidoots · on March 12, 2020

You are 100% correct, I really messed up here by suggesting that option. I misread our own docs. It would only disable event counts from being sent (e.g. instead of "how many jump-to-definitions were performed in a day?" we would just send a boolean "did one or more jump-to-definition occur in a day?" based on my reading of the code[1]) -- not what I thought it did. Will send a PR to clarify the docs on this so I don't mess up like this again..

I'm human and screw up, frequently; this instance just happened to be on the ridiculously important topic of privacy -- hopefully you will forgive me for that, I wasn't trying to be malicious but certainly in retrospect I can see this being interpreted as such.. :/

The right option to turn it all off is just this one, since we only send ping data as part of the version update check you disable that and it's all off. And you can confirm this in the code as I just did here[2][3]: https://docs.sourcegraph.com/admin/config/site_config#update... And as I mentioned previously you can always firewall off Sourcegraph 100%.

As an aside, I can promise you that I wouldn't have continued to work at Sourcegraph for the last 5 years if I thought our business was selling or collecting identifiable user data in ANY form. We only collect just enough information to help prioritize what features we improve and (aside from the first admin's email as I noted already above) it is all 100% anonymous and aggregated numbers that we are extremely transparent about[4]. Our person running analytics is also constantly trying to make this more transparent[5] because we all are very security and privacy aware and know the #1 way to convince people to not run software is to make them think you are spying on them or using their data in ways they would not want.

It's obvious to me this should be more clear in our docs, I'm going to forward all of this conversation onto the rest of our team to make sure we improve our docs here.

[1] https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourc...

[2] https://sourcegraph.com/github.com/sourcegraph/sourcegraph@f...

[3] https://sourcegraph.com/github.com/sourcegraph/sourcegraph@f...

[4] https://docs.sourcegraph.com/admin/pings

[5] https://github.com/sourcegraph/sourcegraph/pull/8930#issueco...

just_testing · on March 13, 2020

Just commenting here to thank you for all the sincerity, transparency and attentiveness in the comments.

enriquto · on March 12, 2020

How long do you estimate we will be able to use this before google inevitably kills the project?

coderdd · on March 12, 2020

Shameless plug: https://github.com/TreeTide/underhood is a work in progress UI over Kythe indices (the same indices that power Code Search).

If you already know how to index, this is a completely open source alternative, likely with less bells and whistles.

I worked at Google and miss Code search. But I have lots of ideas as well how one can go beyond the status quo for code reading and debugging. Join if interested.

vikinghckr · on March 12, 2020

Googler here. We have the same Code Search tool internally, this is honestly one of my favorite things about working at Google. Great to see this open sourced.

cromwellian · on March 12, 2020

This is missing tons of functionality and layers that the internal one has tho, like all of the automatic code analysis and linting, coverage and fuzzing integration, etc

xster · on March 13, 2020

How could it though? Without (bazel) BUILD files, it couldn't even know how to build everything.

thedance · on March 13, 2020

How would the open version support things like the hot functions layer? It makes sense these are missing.

snazz · on March 12, 2020

Some of those features are available with Cloud Source Repositories now too.

sophiebits · on March 12, 2020

It hasn’t been open sourced, right?

kyrra · on March 12, 2020

Correct. This is just exposing indexed versions of some of our larger open source projects with this code search.

Chromium also has its code indexed by the older version of this tool: https://cs.chromium.org/

jedimastert · on March 12, 2020

> Chromium also has its code indexed by the older version of this tool: https://cs.chromium.org/

Chromium recently switched over to a new version of code search.

krackers · on March 12, 2020

They seem to have reduced the information density and killed readability as well with the "material ui" redesign. The old code search UI was perfect, with enough contrast to allow you to quickly grep through xrefs to locate relevant entries. The page itself was lightweight as well.

Compare that to the redesign where each xref jump has me staring at a spinner for half a second, the xref bar has no visual separation between type, filename, and code snippet, all buttons visually indistinguishable with a blue on white color scheme, the "layers" dropdown is replaced with a mishmash of buttons scattered across the layout, etc.

I really hope they're not forcing this abhorrent redesign on their developers as well.

jbroman · on March 12, 2020

Specifically, the new version is at https://source.chromium.org/, and the Android counterpart is https://cs.android.com/.

ehsankia · on March 12, 2020

I guess source.android.com was taken :)

bskap · on March 12, 2020

The site itself has not been, but the library used to build the map of cross-references is https://kythe.io/

akhilcacharya · on March 12, 2020

Confusing. What exactly is special about this code search system? Seems common for internal code search

Cipater · on March 12, 2020

https://news.ycombinator.com/item?id=22552807

mav3rick · on March 22, 2020

Also better than Amazon's internal tool.

mav3rick · on March 12, 2020

Can't fathom a Google thing being great ?

malkia · on March 12, 2020

Really loved this interface (also cs.chromium.org) while I worked at Google. It was easy for me to orient myself, find what uses this and that, where it's being used, and then it had whole "debugging" facility:

You select your binary on borg (think kubernetes/docker), and it'll fetch from the binary with which CL (think like perforce "CL") it was built, and/or additional cherrypicked CL's, then it'll somehow go back in-time and represent how the source code looked then.

later one can (I tried it in Java, but I believe it's available for other languages too), you can inject statements right around the begining of function (a way of breakpoint), and that statement can be something like - let's log how this function was called - you were able to reference nearby statements. This could be set from the command-line, and took a bit mastery (and was bit afraid first time using it, or more like had chilling effect on me), but then my task (with 10 or 11 instances) reported these log lines, and I was able to see them in the browser.

(I have no experience with GCP, or the public face of Google Cloud, so I don't know what's available there), but this was freakin cool.

phillco · on March 12, 2020

Misread as "Google Open Sources Code Search" :'(

cs.opensource.google is amazing

escardin · on March 12, 2020

There's this https://github.com/google/zoekt. It's pretty light on features, but dang if it isn't fast and precise.

hanwenn · on March 12, 2020

thanks!

If you install sourcegraph, you get the same btw. Sourcegraph indexed search is powered by zoekt.

aksx · on March 12, 2020

zoekt is absolutely awesome, I was reading it’s code yesterday to figure out how it does ngram indexing and search.

arunaugustine · on March 12, 2020

I found this on github in Google's org : https://github.com/google/codesearch

thedance · on March 12, 2020

That "codesearch" is only superficially related to this one. The main feature of _this_ codesearch that makes it so useful is the cross references to callers, callees, and overrides. Ye olde codesearch has more in common with things like livegrep.

dmoy · on March 13, 2020

> The main feature of _this_ codesearch that makes it so useful is the cross references to callers, callees, and overrides. Ye olde codesearch has more in common with things like livegrep.

However this part of internal codesearch is the one part that is actually (partially) open sourced: kythe.io

apta · on March 12, 2020

What's the internal codesearch tool written in?

hoten · on March 12, 2020

yeah, title is misleading.

https://cs.chromium.org/ is the next best thing.

yissp · on March 12, 2020

"Google Open Sources Search Code"... wouldn't that be something

ptman · on March 12, 2020

Check out Debian Code Search https://codesearch.debian.net/

snek · on March 12, 2020

Working with chromium/v8, I can honestly say google's code search infra is one of the most valuable resources available. I really hope they open source the backend at some point.

coderdd · on March 12, 2020

The backend is open sourced, it is Kythe.io. It supports go, c++, java out of the box, for some definition of out of the box. Maybe even typescript. Also cross-references protobufs work generated code of you make the stars align ;)

As for UI, treetide/underhood I mention elsewhere is the only open option now.

But Kythe comes with command line utils and an API you can query directly as well.

What is missing from the open source is a production-ready parallel serving table builder. There is one in golang which uses Apache Beam, but last time I checked the go workers are not well supported on the Flink runner. It didn't even work properly on the GCP runner. Hope this would change.

typon · on March 12, 2020

Question for Googlers or others: What do you think is the most well-written piece of software produced by Google? I would like to study how the world's best engineers write code. (Preferably C++, as it's the language i'm most familiar with)

thedance · on March 12, 2020

The one that people at Google who are the keepers of C++ code quality standards maintain themselves is Abseil.

https://cs.opensource.google/kythe/kythe/+/master:external/c...

alexhutcheson · on March 12, 2020

By necessity, Abseil is full of dark template magic that would very rarely be used elsewhere in the codebase. That's the point - it encapsulates a lot of useful abstractions and allows them to be used without the client code author thinking about the guts of the abstraction. But it makes it pretty unusual relative to typical Google C++.

thedance · on March 12, 2020

True for much of it, but if you look at something like cord.h, it's almost free of template programs. Google C++ application code isn't all that spiffy, to be honest. I would say most of the code is dedicated to stuff that nobody outside of Google is going to care about. I think the base libraries are more interesting.

chubot · on March 12, 2020

LevelDB is a widely used C++ project, written by the same people who wrote core parts of Google Search 20 years ago:

https://github.com/google/leveldb

deadmutex · on March 12, 2020

> study how the world's best engineers write code

Note that you'd only be seeing the final result, not the whole process by studying source code. Also, I'd say definition of good code varies by domain.

samatman · on March 13, 2020

Given how version control works, source code is the area of human endeavor for which your first sentence is the least true.

mav3rick · on March 12, 2020

I think base:: our "std library" is pretty well written and portable.

robinshen · on March 12, 2020

Glad to see google open sourced this. I also implemented code search in my open source project OneDev (https://github.com/theonedev/onedev). To try it, please visit https://code.onedev.io/projects/android-framework-base/. Press "t" for quick symbol search, and "v" for advanced search with regular expression support.

You may also hover mouse over a symbol to find its declaration and occurrences.

Hitton · on March 12, 2020

Nice. I'm grateful for this being posted on HN, because discoverability of that page seems to be zero (I couldn't find any link to it from opensource.google. It doesn't even have page title so googling it would be more complicated too.

ComputerGuru · on March 12, 2020

Google used to have code search before (or at least far better than) anyone else did. Then they killed it.

zerd · on March 12, 2020

I misread an thought it said Google Open Sources Code Search, hoping they open sourced https://en.wikipedia.org/wiki/Google_Code_Search

akavel · on March 12, 2020

Yeah, I'm super confused why suddenly now they provide a public code search service again. Uh, capricious overlords :/

kediz · on March 12, 2020

Code search is a great tool! It really helps with productivity! But sometimes it is very easy to go down the rabbit hole.

I wonder if they can open source code search itself.

Eridrus · on March 12, 2020

I don't know the state of it, but Kythe is open source: https://kythe.io/

But in reality you probably want something more like SourceGraph which packages everything up nicely so that you don't need to worry about it, or something more specialized.

thedance · on March 12, 2020

Bringing up kythe from scratch is very daunting. First thing I did when I left google, of course, but still really hard.

__float · on March 12, 2020

What did you use for a UI?

thedance · on March 12, 2020

That's one of the main problems. I started with the demo frontend that comes in the box and just hacked on that. I'm no UI developer by any stretch.

dmoy · on March 13, 2020

Try TreeTide

hanwenn · on March 12, 2020

cs.opensource.google runs on Google's search infrastructure, so it's unlikely to be open sourced. https://github.com/google/zoekt is open source, but lacks cross-referencing, and has a more spartan UI.

dragonsh · on March 12, 2020

If you want to do code search on your private git, mercurial, svn, cvs and other repositories try a fully open source opengrok (https://github.com/oracle/opengrok).

It’s easy to self install and use, with good documentation with added bonus very fast.

gitgud · on March 12, 2020

Sorry to be critical, but can someone explain to me the benefits of having code search this powerful?

Surely code is easier to explore in an IDE which understands the context and dependencies of the project... This just seems like a glorified "find"

marcyb5st · on March 12, 2020

If you think about Google's codebase size, an IDE wouldn't cut it. You could load and analyze dependencies/imports as you go, but that would make for a terrible user experience (think about IntelliJ indexing task every time you want to check the definition of something).

Also, Code Search has baked in a lot of goodies. History layer, cross-references, call sites, ... and it's snappy. Moreover, is really well integrated with all the other internal tools used for coverage, code analysis, issue tracking, web text editor, ... .

I think an IDE (like IntelliJ IDEA) can't reach that level of integration with several other systems unless you fully buy into the ecosystem a company like JetBrain proposes you (their issue tracker, their code review tool, ...).

So, summarizing, it's a tool made by Googlers for Googlers' needs and it's amazing using it every day for all the above reasons.

smadge · on March 12, 2020

You can search for non explicit dependencies. e.g. if you're removing a command line flag in a C++ binary. You can search for all uses of that flag for all users of your binary to make sure it is safe to remove.

pmarin · on March 12, 2020

It's handy for finding examples of real world use of some api. I usually use https://codesearch.debian.net/

SkyBelow · on March 12, 2020

I'm guessing the benefit is the ability to not have to bring up an IDE and download the code in cases where that isn't a quick option.

vkaku · on March 13, 2020

This name is misleading. The domain is okay.

It should probably be 'Google Code Search'. I would have expected Google to come up with a search engine for all Open Source code otherwise.

oscargrouch · on March 12, 2020

Nice to see GN there. I wish more people knew about it.

For me is as powerful as Bazel, but without the need for a JVM and all the insanity that comes with it in a desktop/dev environment.

The syntax is great, powerful (insane customization) and together with Ninja theres nothing like it.

Its in C++ and even being as powerful as Bazel, its a light, standalone library that can handle a huge amount of source code, dependencies, tools and configurations.

londons_explore · on March 13, 2020

Having tried to battle GN configs... I don't agree.

I was working on a big source tree and got frustrated that it kept rebuilding files that hasn't changed just because I switched git branches to look at one file, and then suddenly "Yay, another 18 hour full rebuild!".

I tried to fix it and found there is no option to ignore file timestamps, and some guy has tried to patch it to do that[1]... But the patch requires putting an option in GN files which seems to break them wherever I put it... I tried to patch GN, but it wouldn't ever seem to pass that option through... Ended up patching Ninja to always have the option on, but then random other operations broke (like simple file copies).

A day wasted, and problem not solved. Maybe my use case isn't common, or a bad workman blames his tools, but for me at least it wasn't a nice experience.

[1]: https://github.com/ninja-build/ninja/issues/1459

Omnipresent · on March 12, 2020

sidebar question - Anyone know how they've made the interaction/animation on this page [1] ? Feel like it is a great way to show lot of info in a concise way.

[1] - https://opensource.google/projects/explore/featured

evere · on March 12, 2020

Agreed, it is a very nice little interaction! It seems like they're animating the bubbles around a circle while randomly fluctuating the speed and radius at which they rotate. Clicking on a bubble centers it by setting the rotation radius to `0` and expanding the size.

Would be interested to know how they expand the bubbles as your cursor moves closer.

repomono · on March 12, 2020

We are building similar experiences for internal repos.

Demo: https://demo.repomono.com/cs/view.php

Code is here: https://github.com/repomono/cs

gerash · on March 12, 2020

It looks like they haven't integrated the kythe cross reference DB as the symbols aren't clickable

kwh5336 · on March 12, 2020

A few projects have configured kythe for at least one language. See bazel, go, gvisor, kythe, and tensorflow.

excerionsforte · on March 12, 2020

First impression is that it enables discoverability of code across the open sourced Google projects, but trying to find this page even on Google search is not a thing yet. Is that intentional?

rerx · on March 12, 2020

This is very useful to read and search TensorFlow source code. It definitely beats Github for me.

revertts · on March 12, 2020

How strange - OpenDNS/Cisco Umbrella seems to flag the domain and gives me a 403 Forbidden.

enitihas · on March 13, 2020

Does anyone know the pros and cons of this vs something like Elastic search?

MichaelMoser123 · on March 12, 2020

so far it doesn't seem to index a lot of stuff. I searched from some terms out of my kubernetes/openshift dependencies and it didn't find them. Is this correct?

dunk010 · on March 12, 2020

Is this a frontend for Grok - the thing that Steve Yegge built?

whadar · on March 12, 2020

For Java and JavaScript there's also codota.com

CydeWeys · on March 12, 2020

Nice, my team's project is on there! (Nomulus)

sqs · on March 12, 2020

Sourcegraph CEO here. This is the same underlying code search offered for a while by Google Cloud Source Repositories for private code, and it’s cool to see this usable for Google’s own open-source code, too.

If you want to get universal code search for your own (private) code on any/all code hosts, Sourcegraph is easy to set up internally (self-hosted Docker install) at https://docs.sourcegraph.com/. Or you can get code search for all OSS projects at https://sourcegraph.com/search. More general info at https://about.sourcegraph.com.

Lots of Xooglers and current Googlers use Sourcegraph, too. Just mentioning Sourcegraph because I’ve seen several other folks mention us in the comments (thanks!).

dnpp123 · on March 12, 2020

Thanks. Sourcegraph seems better from my 1mn test ?

Following this article https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-go... recently featured on HN, I tried

https://cs.opensource.google/search?q=%2F%5E(%3F:(%5B%5E:%2F... : no result

https://sourcegraph.com/search?q=/%5E%28%3F:%28%5B%5E:/%3F%2... : 2 results !

voz_ · on March 12, 2020

Great product. Used it extensively at Uber.

sqs · on March 12, 2020

Thank you! Any complaints/requests/other feedback? How could we make it better?

escardin · on March 12, 2020

Yes! Publish a graphql schema that is parseable by apollo graphql please! I tried to use your API on my companies internal source graph setup and had to hand roll my API calls because of these errors.

loicguychard · on March 12, 2020

Sorry to hear you this didn't work for you!

I filed an issue on sourcegraph/sourcegraph, would you mind posting more details there about the errors you ran into? https://github.com/sourcegraph/sourcegraph/issues/8970

HiJon89 · on March 12, 2020

Seems like this is in the works already, but the boolean operators in OpenGrok are so intuitive and powerful. I use them every single day and the lack of support in Sourcegraph immediately disqualified it for us. For example yesterday I was looking for Dropwizard Managed classes not annotated with @Singleton so I did:

Managed && !"@Singleton"

(I'm omitting the fully-qualified class name for brevity)

If I also wanted to look for HealthCheck classes I could update the query to:

(Managed || HealthCheck) && !"@Singleton"

I think it also helps that OpenGrok has a separate input for filtering file paths (completely splitting the "where" and the "what" parts of the query). And this file path search supports the same boolean operators. So if I want to narrow my search to two particular repositories I could put CrmSearch || AutomationPlatform into the File Path input. And because this input only handles file paths, I don't need to remember any special syntax. Whereas if you clump the entire query into a single input, then users need a way to tell you whether a search term applies to file paths or file contents.

rvttt · on March 12, 2020

Engineer at Sourcegraph here. Adding boolean operators is a priority on our roadmap, and expected to go live between May and July this year. On separate inputs: definitely something we've also identified and are actively working on. One recent experimental addition is "Interactive mode" that lets you enter patterns separately for repos, files, and patterns, and so on. There's a dropdown next to the query bar to try it out--there are some kinks, and we're currently working on making this a polished feature. Thanks for the feedback, and stay tuned!

HiJon89 · on March 12, 2020

Awesome, thanks for the response. Looking forward to trying out Sourcegraph again

voz_ · on March 12, 2020

I'm at FB now and we have an internal code search. I apologize, I can't recall any feedback for sourcegraph.

actionowl · on March 12, 2020

I've looked at this a few times but never given it a go. Going to actually try it this week for $DAYJOB.

One nice-to-have would be support for C, does the C++ extension work for C as well?

Thanks!

cforney · on March 13, 2020

Yes, the extension supports C, C++, and C#! Sourcegraph supports over 30 languages out of the box using our basic code intelligence (search based heuristics and ctags).

Check out the recent release notes: https://about.sourcegraph.com/blog/sourcegraph-3.13#basic-co... And how out of the box code intelligence works: https://docs.sourcegraph.com/user/code_intelligence/basic_co...

actionowl · on March 14, 2020

Thanks for the info!

monadic2 · on March 12, 2020

What’s the code licensed under? It’s not clear from your site at first glance.

shoeffner · on March 12, 2020

https://github.com/sourcegraph/sourcegraph/blob/master/LICEN... says: "LICENSE.apache (Apache License) applies to all files in this repository, except for those in the enterprise/ and web/src/enterprise/ directories, which are covered by LICENSE.enterprise."

Thus, Apache 2.0 and some custom license requiring you to accept the terms, have a correct number of seats, and does not allow you to "copy, merge, publish, distribute, sublicense, and/or sell the Software."

I am not sure what all of this means, though. Better check out the licenses yourself :)

sqs · on March 12, 2020

It's open core (Apache 2 + some non-OSS parts for enterprise features). All of the code is public and we develop in the open at https://github.com/sourcegraph/sourcegraph.

dragonsh · on March 12, 2020

Sourcegraph only support git repository so it's not very useful for enterprise with mercurial, svn or other distributed version control systems.

There is another open source application for code search opengrok [1] (it's completely open source unlike sourcegraph and supports multiple version controls beside git).

Take a look. It's easy to install and operate on bare metal, cloud and containers, instead of convoluted sourcegraph way of kubernetes or docker.

[1] https://github.com/oracle/opengrok

aurelianito · on March 12, 2020

You always can bridge to git from svn and mercurial. It is almost seamless, and after generating the git repository everything will work.

dragonsh · on March 12, 2020

Many organizations don’t use or want to use git. This is another convoluted solution, trying to fit a square peg in round hole.

Another reason not to use sourcegraph is it’s proprietary (with some open source parts), unlike opengrok fully open source.

judge2020 · on March 12, 2020

Is the sourcegraph "open core" unlike how redis is "open core", eg. The main code is open but there are paid, closed-source modules and extensions?

sqs · on March 12, 2020

Sourcegraph is open core like how GitLab and VS Code are open core. You can run "Sourcegraph OSS" and get limited features, or you can run Sourcegraph (see https://docs.sourcegraph.com/#quickstart) and get all the features, but you need a license key when you hit the user limit.

zxienin · on March 12, 2020

Is there a variant of this that integrates with IDE of choice?

duckmysick · on March 12, 2020

I really hate that some of the elements on the page are translated into a different language, seemingly based on my IP. When did it become acceptable to ignore my browser or my system language settings? The same thing happens on other Google services (like Google Groups), but I noticed this trend on other websites too.

scruffyherder · on March 12, 2020

This annoys me to no end. I live in Hong Kong. I speak English. We have 2 official languages here, one of which is English. I travel frequently to Japan as well, with infrequent trips to either Europe or North America.

My 'preferences' and settings are a total disaster. I end up having to go onto the gray market to buy gift cards and prepaid credit cards as I seemingly never can buy stuff online when I want to, as I'm either in the wrong place, or in the wrong language. But I know I'm still me.

What is with this '100% of people in this location read/speak the same?'

What if I want to learn Russian, but I'm in China? Why cant I just tell my computer to show me Russian, and the browser tells the site give me Russian if you have it?

Why is this so hard?

I really dislike things that try to make it easy for me, as all they do is prevent me from being able to function.

rkeene2 · on March 12, 2020

The actual mechanism your User Agent tells the server which languages you are interested in is even more robust [0] ! It is a weighted list of preferences.

The reasons I've heard from web developers on why they don't use this is because they believe that the user probably never set that up right, and that multiple people could be using the web browser so they need to be able to do the right thing.

What I typically do is select the best matching language from the Accept-Language HTTP header, and then override it with a session-specific value IF one is supplied. Example:

1. https://rkeene.org/projects/6to4/

2. https://rkeene.org/projects/6to4/?lang=fr

You can see PART of the problem here from the web developers perspective. This isn't a negotiation so you have no way to know which languages the server supports. If your preferences aren't totally inclusive you'll get something "wrong". This can be solved by exposing that information (as not done above) and allowing the user to override it (as above).

[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ac...

LeifCarrotson · on March 12, 2020

The problem is that every big website operator wants to make it work correctly for you, and they (1) have different definitions for success and (2) assume you're incompetent.

In the first point, there's someone with a requirements document that assumes every country has one official language and everyone in that country speaks that language, and so feels successful and internationalization-ready when a geo-IP served page is automatically switched to the "correct" language (much like "Falsehoods programmers believe about names").

Second, configuring a computer's locale to set a browser's request headers correctly is beyond the technical expertise of many users. It would be better if things were consistent, but at the point where some locales were set incorrectly and some were uniquely set intentionally your analytics would have showed that you improved the situation on average by trying to guess the locale (screwing over users who knew how to use their computer) than by respecting it and eventually getting everyone to understand how to set their desired language.

minusf · on March 12, 2020

if you install a hungarian firefox, the accept language header will reflect this (or it did when i tried it last time). Non-expert users also often choose software in their language mutation. I dont have numbers but i wouldnt be surprised if a lot of browsers were sending correct accept language headers.

i dont know IE but it was in a very good position to guess the language of the user as well.

nske · on March 12, 2020

Agreed, but then they should give the users a way of overriding that. Google in particular knows what users want because they force you to answer it when you create your account. I've set every single location setting to UK, and yet when I travel abroad they insist of ignoring it. No excuse there.

henry_flower · on March 12, 2020

> based on my IP

heh

in my case I wish it was based on my IP

but for some unbeknownst reason, some components of the page are in Russian. I'm not in Russia; nothing in my browser request indicates I'd like to read Russian.

_never_k · on March 12, 2020

I live in the US and have a local US IP. A year or so ago, I made a site for a side project using vanilla HTML. No frameworks, no JS. Every word on the page was in English and could be found in an English dictionary.

When I first stood the site up and tested it, Chrome would always break in as soon as it loaded, with a popup to translate the site into _English_ from _Romanian_!

I was able to suppress this only by turning on every single language hint in META.

dewey · on March 12, 2020

Maybe your IP is located in russia according to the geo IP database they are using.

sischoel · on March 12, 2020

I live in Switzerland and it is really annoying, as my IP switches between the German and the French part every few weeks.

I also wonder, what is happening in places, where people traditionally where always from different language groups. I don't know if then there is always a single common language.

izacus · on March 12, 2020

Great part of browsing internet from Swtizerland is seeing webpages that are partially German, French and Italian. Even when set to english :D

geniium · on March 12, 2020

same issue for me :(

sexy_seedbox · on March 12, 2020

Add the "hl=en" param to the URL, works for almost all Google services:

https://cs.opensource.google/?hl=en

nolok · on March 12, 2020

Have you tried changing your accept language ? IE, it is not based strictly on your IP, but also on what your browser asks for.

I'm based in France, with a french IP, but my browser language header is set as ACCEPT-LANGUAGE en-US,en;q=0.9,fr-FR;q=0.8,fr;q=0.7

And this page is fully in english.

duckmysick · on March 12, 2020

My headers are set to en-US,en;q=0.9 (no other languages) on both Chrome and Firefox and the page is partially translated on both.

minusf · on March 12, 2020

It's IP only.

nolok · on March 13, 2020

No it's definitely a mix, like my exemple in parent comment describe. If I change my accept header to remove EN priority I get the french translation that matches my french IP.

Tenoke · on March 12, 2020

I get this extremely annoying behaviour from many sites (sometimes without even an easy option to switch), including Google Search but not here.

duckmysick · on March 12, 2020

For me, on the main page it's mostly tooltips (More Elements in the navbar, Help in the searchbar) and the blue Show Project link. It's worse on the project pages where the description is in English, but the entire table (including dates) is localized.

jsjddbbwj · on March 12, 2020

Same here. Seems like a bug.

neolithic · on March 12, 2020

I find this extremely frustrating as well, across the various Google services I use.

Arnt · on March 12, 2020

Hasn't that been acceptable since the dawn of the web?

Lots of people log errors to some sort of monitoring system. I can't remember seeing any localisation/translation API that would log an error rather than just silently serve English. I infer from this that just serving English is universally accepted and considering it an error is so rare that I've yet to see an API that caters to it.

Arnt · on March 12, 2020

I see someone downvoted and today'a s frustrating day, so let me ask for more.

English is basically the world's default language (like it or not). Sites that translate partially, but sometimes show English text instead of the language specified by the browser expect the user to understand the world's default.

A language inferred from geoip is the user's area's default language. Sites that show that language instead of that specified by the browser expect the user to understand the area's default.

These two behaviours seem really quite similar to me. Their technical backgrounds differ, but the resulting behaviour is much the same. One has been widely accepted since the dawn of the web, AFAICT, which leads me to believe that the other has been just as acceptable for just as long. And thus, my answer to "when did it become acceptable" is "it always was, you just didn't notice".

Thaxll · on March 12, 2020

it's probably not your IP but your browser settings that are taken from the OS.

dntbnmpls · on March 12, 2020

It's part of the "localization" push by governments, news/media, some consumers and tech companies themselves. I guess it's okay for passive consumers, but for tech or advanced/active consumers, it's annoying. I'm pretty sure most major sites/apps/etc all localize. So your google search results, youtube frontpage, etc will be different based on your location.

yorwba · on March 12, 2020

Localization is not a problem; not giving users control over their locale is. If you travel to a foreign country and suddenly can't read anything on the websites you regularly visit, that's pretty bad. If multiple languages are spoken in your country and you're forced to use one you don't speak, that's bad as well. Websites should never assume they know better which locale their users want than the users themselves.

dntbnmpls · on March 12, 2020

> Localization is not a problem;

I didn't say it was a problem for most users ( aka passive users ). I said it's a problem because they make it difficult/impossible for tech/active/advanced users to switch it.

> Websites should never assume they know better which locale their users want than the users themselves.

Yes. You just restated my comment. Not allowing tech/active/advanced users the option is the problem. I love comments that appear to debunked what instead you wrote but just write it in a different way and pretend it is new.

yorwba · on March 13, 2020

Maybe you think that what I wrote was already implicit in your comment, but I'm still not seeing it, and since you got downvoted, evidently a few others felt the same. Next time you'd probably better write it out explicitly.

dntbnmpls · on March 14, 2020

> but I'm still not seeing it

Still?

> and since you got downvoted, evidently a few others felt the same.

This has got to be the saddest thing I've ever seen on a forum.

> Next time you'd probably better write it out explicitly.

Okay I'll give it another shot.

https://news.ycombinator.com/item?id=22563157

Let me know if that cleared things up for you.

rochak · on March 12, 2020

I hope I get to work for Google sometime.

vikinghckr · on March 12, 2020

Just apply. Google is actively hiring all year round.

rochak · on March 12, 2020

I am a grad student right now with 2 years of industry experience. Google still prefers people who are extremely good at Data Structures and Algorithms. I like doing them, but not so much to just grind them for the sake of getting into Google. I like to learn how to design big systems and grinding Data Structures and Algorithms seems like a waste of time.

CydeWeys · on March 12, 2020

I put in "only" 40 hours of refreshing on data structures and algorithms, and doing some practice coding problems, in the weeks leading up to my interview. And I got the job.

Frankly, it's been the best hourly return on investment of anything I've done in my life up to this point, by far. Assuming I wouldn't have gotten the job otherwise (which seems reasonable), each of those hours spent studying has proven to be worth several tens of thousands of dollars. I'm not exaggerating; I just did the math.

Maybe the interviewing process is broken or sub-optimal or whatever, but it is what it is, and if you can get through it by doing some additional studying, then it's absolutely worth it. Google is a good place to work on designing big systems, so if that's your interest, consider just putting in the work.

rochak · on March 13, 2020

This is a solid advice. Thanks! I will try to dedicate a portion of my day to brusing up Data Structures and Algorithms and maybe, eventually, I will get good enough to crack the interview.

jedimastert · on March 12, 2020

Go for it. The worst they can say is no, in which case you can come back in a year.

rochak · on March 12, 2020

Yeah, but they need people good at Data Structures and Algorithms. I think I am above average but nowhere near the quality of people that they hire. Also, I am more interested in designing big systems end-to-end and feel that doing a lot of Data Structures and Algorithms is a waste of time.

jedimastert · on March 12, 2020

Tell that to a recruiter. It's their job to find jobs that are a good fit.

rochak · on March 13, 2020

Yeah, but the recruiters don't respond.

tex0 · on March 12, 2020

Then apply for SRE.

jonathanoliver · on March 12, 2020

I'm just gonna leave this here: https://killedbygoogle.com/

wahern · on March 12, 2020

Notably, they killed Google Code Search: https://en.wikipedia.org/wiki/Google_Code_Search

IIRC, Google Code Search was the impetus for creating the RE2 library.

mav3rick · on March 12, 2020

How is this contributing to the discussion. Do you feel cool now after a snarky comment ?

carapace · on March 12, 2020

Do you?

mav3rick · on March 12, 2020

I am pointing out what he did. And yes I do feel cool for pointing out something egregious.

CamperBob2 · on March 12, 2020

It's reasonable to warn less-experienced people about Google's historical behavior.

People build their businesses on Google products and services like this, not realizing the 90%+ mortality rate over 5 years or so.

mav3rick · on March 12, 2020

It's reasonable for a big company to try and fail. When they don't try anything 'they're not innovative'. When they do..'oh look they failed'.

CamperBob2 · on March 12, 2020

That's a bit of an oversimplification. Google's failures often garner more marketplace traction than other companies' wildest success stories.

But at the end of the day they are an advertising company. Any product that doesn't help them sell advertising -- and lots of it -- will eventually impede the progress of the careers of the managers and employees who work on it. That's when the axe falls... not when a product "fails," necessarily, but when it's no longer "sexy."

mav3rick · on March 12, 2020

Yes just like Waymo a 10 year bet is for advertising.

CamperBob2 · on March 12, 2020

Waymo is (a) no longer affiliated with Google, and (b) basically a hobby project. It is comparable to Blue Origin for Bezos, any of Musk's numerous speculative ventures in fields from tunnel-digging to AI, or the original AppleTV for Steve Jobs.

People who work there are very well aware of that, and are OK with it. No one outside the company should allow their own business or career path to depend on Waymo, at this stage. It could vanish tomorrow at the whims of the Alphabet execs and/or directors, because their own business doesn't depend on it.

mav3rick · on March 13, 2020

Lol Google was one company till 2015. Alphabet and Google have the same CEO. Just shows your biases in trying to prove your point. It was Google who pumped in money in Waymo. Have fun using ddg and Firefox.

s_y_n_t_a_x · on March 13, 2020

> Just shows your biases

> Have fun using ddg and Firefox

And you've shown yours.

mav3rick · on March 12, 2020

That is not true at all. You are regurgitating half baked theories of HNers who are just salty at Google (some balk at their success , some didn't get in etc .). Google's going to keep changing the world. HN is going to keep complaining.

CamperBob2 · on March 12, 2020

Ego is a helluva drug, all right.