COMPLETED (33208) - [BUG] HTML validator broken/Unicode

7k · January 18, 2012

As it was thoroughly ignored over there, here's a dedicated thread, so it can be ignored even better:

Entering the following into a cache listing:

<foo>test</foo>
<foo>test</foo>
<foo>test</foo> [line butchered by the forum, see here for the raw text: http://pastebin.com/2TmR5UVe ]
&amp;lt;foo&amp;gt;test&amp;lt;/foo&amp;gt;

instantly gets translated into this upon submission:

test test
<foo>test</foo>
<foo>test</foo>

So, you have to triple escape the HTML entities to make them show up. Additionally, double escaping allows you to inject disallowed HTML code (such as <foo>). Submitting the form a second time would put the code through a second round of butchering and mess it up even more.

7.4k · January 18, 2012

Although there was no direct response in that thread, the engineering team was notified of the issue and are investigating.

2.8k · January 19, 2012

Hi, there were many threads about that, and promises about fixing that. I'm talking about some characters, what are not available in cache description, like "ąęśćńźżł" and probably many more, what are in use in some European alphabets. This problem was fixed in TB descriptions, but not in cache descriptions. Is any chance to fix that? What is the technical problem with that?

I'll be grateful for any answer from Groundspeak Lackeys.

This is the one of reasons why people in some countries prefer local caching than geocaching.com website...

7k · January 19, 2012

The current problem is already explained elsewhere. You have to use unicode HTML entities, such as & #261; (without the space) for the letter ą. However, the HTML validator is currently screwed up, and you have to enter the entity as &ą instead, and make sure you save/submit it only once (it will get messed up again if you save more than once).

So far, the problem has been barely acknowledged let alone fixed, despite the fact that it allows you to inject unallowed HTML code into your listings...

Edited January 19, 2012 by dfx

2.8k · January 19, 2012

and make sure you save/submit it only once (it will get messed up again if you save more than once).

So it's completely useless.

But TB listings were fixed, what is more strange for me...

7.4k · January 20, 2012

Work on correcting this issue has begun and the issue should be fixed in our next release.

7k · January 20, 2012

What? No hotfix for such a serious issue? :unsure:

1.2k · January 20, 2012

Now that Seattle is starting to thaw out from our Snowpocalypse 2012 and we've all made it back into the office we will be doing a hot-fix this afternoon.

-Raine

January 21, 2012

hello Groundspeak!

I'd like to renew the really old topic on stripping the national characters out of the cache descriptions when creating or editing cache listings.

by saying "really old" I mean the issue was first reported 6 YEARS ago here and then 2 YEARS later here.

both topics are now closed and the last Groundspeak news was you were aware of the issue, were sorry for that and were planning to fix this.

that was on August 2009...

today we are another 2.5 YEARS LATER but the bug seems to be still there.

could you please clarify what's the current status on this? have anything been done? is it still being planned? or maybe declined?

I'm from Poland and when posting any new cache listing I try to use any of Polish 9 (or 18 with case) diacritic characters - unfortunately with poor result. The only Polish character that gets accepted is "ó" - that's sad..

I guess the problem not only applies to Polish but to any other non-English / non-American language speakers around the world who would like to use their native language and characters in cache listings. and since geocaching.com site is totally global now I think we deserve more attention and more effort in solving this global issue too.

really looking forward to hear from you soon some good news on this!

kind regards!

7k · January 21, 2012

The current problem is described here: http://forums.Groundspeak.com/GC/index.php?showtopic=288981

(the promised hotfix didn't happen)

In short, cache pages are in unicode, but the edit page entry box doesn't accept UTF-8. The workaround is to encode the unicode characters as HTML entities, but with the broken HTML validator you have to encode them thrice.

January 21, 2012

[...]we will be doing a hot-fix this afternoon.

-Raine

Apparently this broke my cache listing which included HTML-entity encoded japanese characters. When saving the page, the japanese characters get decoded, and get shown in the edit box. In the cache listing, there are only "?". Now, since this is a mystery and part of the hints are in the japanese text, this is really bad. Please fix immediately.

January 21, 2012

The current problem is described here: http://forums.Groundspeak.com/GC/index.php?showtopic=288981

well, I wouldn't say the HTML validator is the root cause of the problem.

In short, cache pages are in unicode, but the edit page entry box doesn't accept UTF-8.

so this is the real problem here. but as OpinioNate said here Groundspeak was planning to replace this non-UTF text box already in 2009..

so is it really still the case??

can we hear some official news from Groundspeak team please?

The workaround is to encode the unicode characters as HTML entities, but with the broken HTML validator you have to encode them thrice.

well, I can't imagine how do you explain that to some non-technical person (as probably 99% of geocachers are) that instead of simply write their national characters they need to generate some black magic HTML entities stuff and find/replace it in their cache descriptions before posting..

IMHO this is not an option at all.

7k · January 21, 2012

IMHO this is not an option at all.

I agree that it's just an ugly kludge, a workaround, and not a solution.

7k · January 24, 2012

# twiddles thumbs #

1.2k · January 24, 2012

Instead of applying another band-aid to this situation we're in the processing of preparing to upgrade our database tables to support Unicode characters. I've delayed the hot-fix until we can perform regression testing on these changes.

7k · January 24, 2012

Well, mangling the entities isn't really unicode related. Is it gonna fix both?

1.2k · January 24, 2012

Yup

January 25, 2012

hello Groundspeak!

I'd like to renew the really old topic on stripping the national characters out of the cache descriptions when creating or editing cache listings.

(...)

hello? can we expect any answer please?

7k · January 25, 2012

over here

7.4k · January 25, 2012

I'm merging duplicate threads.

1.1k · February 3, 2012

I've also run into this problem today.

Rgds, Andy

February 6, 2012

I have made a cache listing, and wanted to include Korean.

Usually we converted the Korean into UNICODE, and then include it in the listing.

The result should be Korean characters in cache listings.

However, the UNICODE has broken and turned into just "????".

Maybe recent upgrades of the site cause it.

Please check it for me.

February 13, 2012

Please address this bug as soon as possible.

4 of the 13 puzzle caches in my published Lost Cities puzzle series rely on Unicode to support the foreign language text of their puzzles. Upon revising the web pages today to remove a temporary advisory, the Unicode was stripped to display the foreign language text of their puzzles as all "???"

"Lost Cities" is very popular series and it is a shame that cachers won't be able to pursue it due to these technical problems.

This is the second bug from recent upgrades that has corrupted my cache pages. From now on, every time that there is a Geocaching.com site update, I'll cringe at the thought of what might happen to my cache pages.

Edited February 13, 2012 by Lati.dude

1k · February 14, 2012

Until Groundspeak fixes this you can display the foreign characters as an image. It might not be readable on some paperless GPS devices, but then again they probably couldn't display Unicode characters in the first place.

Edited February 14, 2012 by Ambient_Skater

February 14, 2012

Until Groundspeak fixes this you can display the foreign characters as an image. It might not be readable on some paperless GPS devices, but then again they probably couldn't display Unicode characters in the first place.

Thanks for the suggestion Ambient Skater.

Unfortunately, the key to solving my puzzles that rely on foreign language Unicode is to highlight and copy the foreign characters on a PC or Android, then paste them into Google Translate or run a direct Google search. That is why I used Unicode in the first place. Pretty hard to transcribe them from an image into the appropriate foreign language unless you have the right keyboard. :wacko:

I tried re-coding the html for my Unicode characters per the suggestion from dfx above (e.g. "&ą"), but without success. Anyone have a work around until Groundspeak fixes this bug?

Luckily, I saved a copies of the original html for all my pages prior to the 1/17/2012 update. The new Groundspeak html warning should read: To prevent lost data, we strongly recommend you save an offline copy of your cache description before submitting to the website or making any revisions thereto.

7k · February 14, 2012

I tried re-coding the html for my Unicode characters per the suggestion from dfx above (e.g. "&ą"), but without success.

You have one "&" too many in there. To convert ą into something that works, you need to make it &#261;

7.4k · February 14, 2012

Lati.dude, this issue is expected to be corrected in today's release.

February 14, 2012

Lati.dude, this issue is expected to be corrected in today's release.

EXCELLENT!!! I look forward to being back in business.

February 15, 2012

It's really good that unicode is now accepted in the description of caches. But why it does not work in logs?

7.4k · February 15, 2012

Cache logs are a separate table in the database and still have yet to be converted.

COMPLETED (33208) - [BUG] HTML validator broken/Unicode

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation