Things You Didn’t Know about Strings

Whether you have been coding for just a few months or a decade, you constantly learn new things – hopefully not all of them from your mistakes. In this article, I will try to show you that even something as simple as strings of text may still surprise you in its hidden complexity and all the things you have to watch out for in order to write efficient bug prone code. Spoiler alert: it is possible to bring an application down with inefficient use of strings!

String garbage collection / String premature optimization

It is essential that you understand each string concatenation or assignment creates a new object in memory which may be causing performance issues if done in specific places of your application. Don’t go crazy using string interpolation or string.format everywhere. Premature optimisation is one of the antipatterns you should avoid. You should always pick whatever is easier to read unless you are working in some often executed loop in which case you should start thinking about performance – string builder/string concatenation/string interpolation. This article explains how to detect string classes garbage collection and other issues in applications. It also explains why and what is happening behind the scenes. 

TLDR: Each time you do any operation on the string you create a new copy of this string, and this takes memory (a lot of the times strings take most memory of all clr objects). Garbage Collector will try to free memory whenever your app is in need of it, and if you are constantly allocating and always in need of memory it will naturally never stop collecting which will hit your CPU and by extension decrease overall performance (sometimes dramatically!)

String formatting / Comparing

You need to understand how strings can be compared – often the best bet is to use String.Equals so you have some control over the comparison, e.g.:

var s1 = “Strasse”;
var s2 = “Straße”;
s1.Equals(s2, StringComparison.Ordinal); //false
s1.Equals(s2, StringComparison.InvariantCulture); //true
Also, as a good developer, you have to understand formatting and should use it when appropriate. To learn more on this topic, read this article.

String globalization / Security

You should avoid hard-coding strings in your application. Instead, use ResourceFiles (so that if one day you decide to add support for multiple languages it is easy to do so). It is no longer optional for web apps to be Multilanguage – you should support it from day 1 as it costs nothing now but can potentially save weeks/months in future. Also when handling sensitive information consider using secure strings. In addition, ProtectedMemory and ProtectedData classes often come in handy. 

Note: All the above methods use windows specific security services (DPAPI) hence there are issues with it on non-windows environment in .net core. 

A skilled developer knows and use helpers created by framework designers and should never reinvent the wheel. Here are some useful tips/helpers for working with strings:

  • Use Environment.NewLine instead of unix \n or windows \r\n
  • Use String.Join to convert arrays of values to coma-separated strings. (Array.Split can reverse the process)
  • When checking for empty strings use String.IsNullOrEmpty
  • Be mindful of encoding, especially when saving to files and databases; the best idea is usually to use UTF-8 and save it to nvarchar fields (its equivalent of unicode in mssql)
  • Pass CultureInfo to any string helper and method like ToUpper – it will make your life easier in the future, did you know that ToUpper(“i”) will not yield “I” in Turkey? It will return “İ” – Unicode U+0130.

To save yourself time in future and to avoid the risk of reinventing the wheel, spend few minutes here browsing all available extension methods you can copy to your projects. It’s good to know they are there and what is possible so when the time comes you have it in the back of your head and can quickly make the right choice and not code again what has already been coded for you.

Emojis

If you process text containing emojis you need to be aware of encoding and surrogate parts – default encoding for string in .net is utf-16 but emoji characters are really utf-32, so in .net strings they take the space of 2 utf-16 chars, this is called the surrogate pair. Some unexpected results of this may be for example checking length of a string – 2 smilies will actually be of length 6 as far as C# sees it on default string encoding.

Magic Strings / Consts

You need to be aware of the “burn-in” string effect. Once you add a const to a DLL and use it in other projects which reference this DLL- they don’t really reside in this referenced assembly. At build process they are burned into the CLI of any assembly that uses the const – this means that if you change const and only rebuild the assembly that contains this const declaration, all the assemblies that reference the original dll (containing const) will still have the old values “burned-in” them. That can easily be a cause of hard to spot errors so be aware of that. Also, I hope this is common knowledge but do not ever use magic strings! Use consts and enums instead, it’s a little extra work and you will thank yourself in the future for doing it.

Surprised? I know. I was surprised each time I found second depth to something I considered trivial. This is what differentiates students from masters – deep knowledge of your tools and these “little things” that make a difference. So, stay vigilant and curious about how things work in this strange world of .Net and software engineering. Good luck! Oh, and in case I forgot something, or you have any other interesting “trivia” regarding strings or other seemingly basic .net concepts – please leave a comment. If there’s a lot of them perhaps I can make a sequel to this post.