Initially by the indentation I came to conclusion that that it was either code (I assumed HTML) or nested quotes.
At first I was examining the hypothesis that "i" are delimeters between letters, but then DanWL shared a tip:
"NONwWateriRRNRRiNNANNiNNANNiRRNRRndwANNNAiARRRAiANNNAnNON" = "Hi"
By inspecting this example together with full encoded text I assumed:
- "i" is not a delimeter between encoded letters, but a letter is encoded as "XXXXXiXXXXXiXXXXX", and "ndw" is delimeter between letters. That also fit well the text infaltion ratios: 1 original letter = 12-13 letters of encoded text.
- Almost all lines start and end with some kind of "NON"s, it may be garbage or starting/ending characters
- "Water" is probably some random noize :)
Next I regexed all possible 'XXXXXiXXXXX' encoded words to check if their counts and distribution/frequencies seem like letters. So i got 47 unique "i"-codes with different frequencies. I had some experience with word/letter games, and the distribution, which I got, kinda resembled the english letters (
https://www3.nd.edu/~busiforc/handouts/cryptography/letterfrequencies.html).
I also initially assumed that 47 encoded letters were 26 uppercase + 26 lowercase letter minus unused ones. Later it turned out to be probably false (need checking).
Ok, if the original text is encoded via some kind of substitution, we need to try to identify some of them. First I substituted all that codes with a lighter codes. So instead of "NNNNCiNNNCNiNNCNNiCANNNiNANNNiHANNNiNNHNNiNNNHNiNNNNH" I would get text consisting of something like ".10001" (where number after "." is corresponds with frequency, ".10001" - most frequent, ".10002" - second most frequent).
After that I searched for words with double letters to try to guess the letters. And found the word ".10007.10001.10002.10002.10001.10008.10006". It consists of most frequent letters, has a double letter it it, with same letters before and after. I became sure that letter ".10002" is "t" and the word is "letters".
So I started filling a substitution dictionary, starting from letters in word "letters" :) and trying to find new words in the encoded text .
At some point I noticed that there were 26 lines in the encoded text, which had similar beginnings, with one decodable letter, but that letter was each time different. After some thoughts, I realized that it was a dictionary, where there are 26 English letters as keys and some stuff as values.
Since that I was able to identify all English 26 letters, understood that it was Javascript code. Later found a dictionary with nonletter-symbols, that helped a bit, and some punctuation marks become obvious too.
So right now I have 80-90% decoded script and have a substitution dictionary for every except but a few code symbols (probably something like "+"). The issues for me is that that the script has complex functions to encode the text, with things like arrays, nested JS loops, and functions to augment encoded letters with pseudo-random prefixes and suffixes. That is probably for those, who know JS.
Basically the final push should be to debug the script to understand how the algorithm works and to fix that pseudo random stuff. I have sent you a DM with currently decoded script and substitution matrix.
Edited 4/10/2022 11:06:19