Tuesday, June 30, 2015

Some Database Indexing Strategies

If you program anything at all, you probably use some sort of database for storing your data. When it comes to performance, index usage is key; not many people will argue that point. There are, however, many different ways to create indexes. In this post, I investigate 3 different indexing strategies to establish some guidelines. I'll be using the excellent PostgreSQL. Most of what I'll discuss is relevant to any system that uses indexes (even the trendy NoSQL systems) but you should test to ensure the behavior is identical before committing to a particular strategy.

Starting with some basic information, I'll be investigating 3 indexing strategies against the following table:

CREATE TABLE test_table (

The names are significant: each column that starts with s1_ is used to test strategy 1, s2_ for strategy 2, etc. Once the table was created, I populated it with approximately eleventy jillion rows of random data. However, the data is created in such a way that s1_c1 == s2_c1 == s3_c1 and so on for each of the _c* columns. This ensures equal data distributions and fairness across the tests.

I'll be using a query of the following form for tests:

SELECT s?_c5
  FROM test_table
  WHERE s?_c1 = ?
    AND s?_c2 = ?
    AND s?_c3 = ?
    AND s?_c4 = ?;

This roughly simulates a common situation in a database, where you're selecting data based on a complex set of criteria. A far more common situation is where you're selecting all the columns of a table based on a single primary key column, but that case really only has a single indexing strategy available, and is therefore not interesting to this post.

If you're wondering why I didn't tell you exactly how many rows I populated with, it's because the impact of that size is going to vary depending on your hardware: one million rows will be a significant amount of data on one computer, while it's a trivial amount of data on another computer. Suffice to say that the number of rows is large enough that on my testing computer, it takes an average of 225 seconds to run the test query when no indexes are involved. Any manager I've ever worked for would complain about a query that slow, so we'll need to add some indexes.

Strategy 1: Individual Indexes

The first strategy to investigate is placing an individual index on each of the columns specified in the WHERE clause, like this:

CREATE INDEX s1_i1 ON test_table(s1_c1);
CREATE INDEX s1_i2 ON test_table(s1_c2);
CREATE INDEX s1_i3 ON test_table(s1_c3);
CREATE INDEX s1_i4 ON test_table(s1_c4);

This helps our test query quite a bit, improving the average speed to 21 seconds, an improvement of over 10x.

Stragety 2: Composite Index

PostgreSQL (like most database systems) allows you to create a single index composed of multiple columns, called a composite index. Creating one to help our test query looks like this:

CREATE INDEX s2_i1 ON test_table(s2_c1, s2_c2, s2_c3, s2_c4);

This improves performance even further, bringing the average speed down to about 50ms (yes, that's milliseconds, or 0.05s)

To understand why the composite index is so much faster, it helps to look at Postgres' EXPLAIN output. For a query using indexing strategy 1, we get this explain:

Bitmap Heap Scan on test_table (cost=3304.35..3755.22 rows=1 width=8)
  Recheck Cond: ((s1_c1 = 936) AND (s1_c2 = 196))
  Filter: ((s1_c3 = 49) AND (s1_c4 = 94))
  -> BitmapAnd (cost=3304.35..3304.35 rows=113 width=0)
    -> Bitmap Index Scan on s1_i1 (cost=0.00..1082.30 rows=58365 width=0)
      Index Cond: (s1_c1 = 936)
    -> Bitmap Index Scan on s1_i2 (cost=0.00..2221.80 rows=120164 width=0)
      Index Cond: (s1_c2 = 196)

What this means is that PostgreSQL searches two of the four possible indexes and generates a result bitmap, then combines the two result bitmaps into a list of rows that match both conditions, and scans through those rows to filter out the final result. While this seems overly complex, whether or not it's the most efficient method depends on the distribution of the data in the columns, which Postgres determines when ANALYZing the table. Other database systems will have similar approaches to finding the fastest plan, but in the case of multiple indexes, they'll all include multiple steps to narrow down the result to its final form.

Now, with a query using the composite index, PostgreSQL is able to find matching rows in a single step:

Index Scan using s2_i1 on test_table (cost=0.56..8.59 rows=1 width=8)
Index Cond: ((s2_c1 = 469) AND (s2_c2 = 326) AND (s2_c3 = 132) AND (s2_c4 = 123))

While a single operation isn't always faster than multiple operations, in this particular case it's much faster. This degree of speedup is fairly typical and is what makes composite indexes so powerful.

Composite indexes have some unfortunate limitations, however. The index data is stored in the order you specify in the CREATE INDEX statement, and it can only be accessed from left to right. For example, if we want to do a query on the table WHERE s2_c2 = ?, the database would be unable to use the index, and we'd have to wait 200+s for a complete table scan. With strategy 1, the database could use index s1_i2 and we'd still get decent performance.

We could create additional indexes, indexing each column more than once. In many cases, this is a good strategy, but remember the two downsides to indexes: they take up disk space and they slow down inserts and updates. How much space? Well, each of the indexes from strategy 1 uses about 1.4G, for a total of about 5.6G; while the composite index for strategy 2 uses a little over 3G. How much indexes slow down inserts and updates is a complicated question, and a good subject for a future post. For now, suffice it to say that you should carefully test the impact before adding lots of indexes to your production servers.

Strategy 3: Covering Index

When even a composite index isn't fast enough, I can go one step further and make what's called a covering index:

CREATE INDEX s3_i1 ON test_table(s3_c1, s3_c2, s3_c3, s3_c4, s3_c5);

This speeds up our query to about 0.5ms (yes, that's ms ... meaning 0.0005 seconds). The EXPLAIN output shows that Postgres is now doing an "Index Only Scan" which is very similar to the index scan in strategy 2, with one important difference: in a normal index scan, Postgres uses the index to find row IDs that match the WHERE condition. Once it's found them, it has to look at the actual row data to find the requested data (column s?_c5 in our examples).

With a covering index, that column is also included in the index data itself, so Postgres can skip the step of fetching the data from the table, which (in this case) improves performance by about 100x.

Covering indexes have all the same disadvantages of non-covering composite indexes, except that they're even less flexible. If we want to change our test query to return column s1_c5 as well, the index is no longer covering and we either have to live with the performance hit or make a new index that covers s1_c5 too. Since there are additional columns in the index, it's larger on disk and is more likely to impact insert or update performance.


Indexing is too complicated to address in a single post, so I attempted to narrow the focus of this post to a review of the benefits of three common strategies. If you don't have tens of millions of rows in your database, you might find that almost any indexing strategy you use yields good results, but if you're hard pressed to get the response time you want, the information in this post should help you plan your strategy. Of course, the differences in data from one application to another make it difficult to make blanket statements, so you should test different approaches and see what works best for you.

In general, however, covering indexes are usually a good approach when you have a few queries that need to be very fast that can be serviced by a covering index; and the table isn't seeing a lot of insert and update activity -- or the speed of those modifications is less critical than the SELECT queries.

Individual indexes are best when there are a wide variety of queries that need to be serviced and it's difficult to create a few covering indexes that service all of them. Composite (but non-covering) indexes fall somewhere in between.

For complicated situations, a combination of these techniques will probably be required.

In any case where INSERT/UPDATE/DELETE performance is critical, you should carefully test to ensure that each index you create is worthwhile; since each index slows down modification queries a bit. The actual scale of that impact will be the subject of another post.

Thursday, November 13, 2014

Java Thread Synchronization performance: round 2

You didn't think I was done, did you?

The previous post attempted to look at the most basic component of Thread synchronization and estimate its cost. The conclusion was that the cost is too low to be considered an issue.

Now I want to add another element to testing. Does the number of threads that might hold the lock significantly impact synchronization performance? I'm still not considering actual contention for the lock (I'll get to that). But how well is the synchronization code optimized to handle many threads, even when there is no contention.

My conclusion here? That the number of threads does impact synchronization performance, however, that impact is minimal to the point of unimportance. It's possible that continuing to increase the number of threads will eventually find a break point where performance degrades very quickly, but the change was fairly constant up to 1000 threads. The increase in time is small enough that it's not worth fussing over. If you're building a massive application that has many thousands of threads, you should probably run your own tests to be sure of the impact, but otherwise I wouldn't worry about it.

The code and results are included at the end of this post, so you're free to draw your own conclusions.

I ran the tests on FreeBSD 10 with OpenJDK 1.7.0_60-b19 (some of that version number may be FreeBSD-specific version information). Feel free to grab the code and try it out on your favorite JVM. Of course, hardware will have a non-trivial effect on overall run time, but the ratio of run times between the various numbers of threads should be fairly consistent within a specific implementation. Whether other JVMs have different performance characteristics is a question for someone else to answer.

Test Code:
public class Test2 {

  public static boolean run;
  public static void main(String[] args) {
    Test2 app = new Test2();
  private void go() {
    int loops = 30000000;
    int[] sizes = new int[] {1, 1, 5, 10, 50, 100, 500, 1000};
    int runsEach = 30;
    for (int size : sizes) {
      System.out.println(">> Testing with " + size + " threads");
      long total = 0;
      for (int i = 0; i < runsEach; i++) {
        total += doTest(loops);
        try {
        catch (InterruptedException e) {
          System.out.println("!! Main Thread interrupted");
      System.out.println(">> Average time = " + total / runsEach);
      System.out.println(">> Average time per call = " + total / (runsEach * loops));
  private TestThread[] threads;
  private void startThreads(final int num) {
    threads = new TestThread[num];
    run = true;
    for (int i = 0; i < num; i++) {
      threads[i] = new TestThread();

  private long doTest(final int loops) {
    long start, end;
    TestThread test = new TestThread();
    start = System.nanoTime();
    for (int i = 0; i < loops; i++) {
    end = System.nanoTime();
    System.out.println("Run time " + (end - start));
    return end - start;

  private void stopThreads() {
    run = false;
    for (int i = 0; i < threads.length; i++) {
      try {
      catch (InterruptedException e) {
        System.out.println("!! Thread interrupted before successful join");
  private class TestThread extends Thread {

    public void run() {
      while (run) {
        try {
        catch (InterruptedException e) {
          // Just continue with the loop
    public synchronized void work() {
      float f = 1;
      for (int i = 0; i < 10; i++) {
        f += f;

>> Testing with 1 threads
Run time 795962042
Run time 786286857
Run time 199196969
Run time 16297397
Run time 17163671
Run time 16824990
Run time 16407745
Run time 17554201
Run time 15890607
Run time 17541246
Run time 16434331
Run time 16932517
Run time 16085874
Run time 15215291
Run time 15619357
Run time 15115641
Run time 15780876
Run time 15205568
Run time 15643645
Run time 15472839
Run time 15974359
Run time 15157174
Run time 15617533
Run time 15438992
Run time 15468485
Run time 15507157
Run time 15244974
Run time 15875522
Run time 15191091
Run time 15783097
>> Average time = 73729668
>> Average time per call = 2

>> Testing with 1 threads
Run time 15386418
Run time 15348036
Run time 15663421
Run time 15212932
Run time 15658743
Run time 15153505
Run time 15835270
Run time 15256209
Run time 15382945
Run time 15965559
Run time 15231225
Run time 15880726
Run time 15271493
Run time 175648894
Run time 178731461
Run time 190533321
Run time 199313983
Run time 199060600
Run time 199415887
Run time 198765814
Run time 199796152
Run time 198772032
Run time 199028302
Run time 199572685
Run time 199027456
Run time 199549007
Run time 198859863
Run time 198762869
Run time 198742303
Run time 198504792
>> Average time = 117777730
>> Average time per call = 3

>> Testing with 5 threads
Run time 199484393
Run time 198697191
Run time 200076825
Run time 199277548
Run time 199980755
Run time 199297630
Run time 199438981
Run time 199254148
Run time 198903016
Run time 199377172
Run time 199109173
Run time 200369934
Run time 199544113
Run time 200267803
Run time 199367557
Run time 200217676
Run time 199541885
Run time 199430942
Run time 199129716
Run time 198968970
Run time 198969677
Run time 199830897
Run time 198897138
Run time 199976410
Run time 199251960
Run time 199445897
Run time 199301249
Run time 198974016
Run time 198842029
Run time 199144413
>> Average time = 199412303
>> Average time per call = 6

>> Testing with 10 threads
Run time 199150382
Run time 199320156
Run time 199138419
Run time 199160883
Run time 199245564
Run time 199021987
Run time 198903988
Run time 199016954
Run time 199358554
Run time 199310135
Run time 200188974
Run time 199290615
Run time 199183679
Run time 199238515
Run time 199217762
Run time 199035672
Run time 199041539
Run time 199471882
Run time 199150993
Run time 199165169
Run time 199301659
Run time 198992886
Run time 199086269
Run time 199257415
Run time 198941664
Run time 199705914
Run time 199222760
Run time 199960150
Run time 199671478
Run time 200217657
>> Average time = 199298989
>> Average time per call = 6

>> Testing with 50 threads
Run time 200530217
Run time 199892436
Run time 197262962
Run time 201208398
Run time 199715748
Run time 199992779
Run time 200659787
Run time 200165258
Run time 199800937
Run time 199635001
Run time 200087977
Run time 200112382
Run time 200363229
Run time 200298460
Run time 199473022
Run time 199927887
Run time 199505967
Run time 200106173
Run time 200292659
Run time 200843640
Run time 199689815
Run time 200025039
Run time 200644594
Run time 199538623
Run time 199061118
Run time 199193134
Run time 200095598
Run time 199744275
Run time 199588208
Run time 198247397
>> Average time = 199856757
>> Average time per call = 6

>> Testing with 100 threads
Run time 200634862
Run time 200379624
Run time 200683595
Run time 200798071
Run time 199676874
Run time 200784278
Run time 200763565
Run time 200921517
Run time 200344951
Run time 200623614
Run time 200946937
Run time 200934182
Run time 200399438
Run time 200897918
Run time 200741948
Run time 200843386
Run time 201104921
Run time 200010681
Run time 200800678
Run time 200658461
Run time 200565042
Run time 200107552
Run time 200443788
Run time 200282941
Run time 200987905
Run time 201173373
Run time 200345061
Run time 200222440
Run time 200900207
Run time 199372010
>> Average time = 200578327
>> Average time per call = 6

>> Testing with 500 threads
Run time 203454398
Run time 205942401
Run time 210786881
Run time 206963343
Run time 202776739
Run time 205442114
Run time 208851151
Run time 207165043
Run time 215251977
Run time 209431195
Run time 205501259
Run time 215273283
Run time 212493174
Run time 212313821
Run time 206062559
Run time 208761225
Run time 209276176
Run time 210185340
Run time 206295473
Run time 208948272
Run time 208813577
Run time 209519561
Run time 211457223
Run time 208937196
Run time 205003769
Run time 211591797
Run time 208652449
Run time 207622638
Run time 210153556
Run time 211430555
>> Average time = 208811938
>> Average time per call = 6

>> Testing with 1000 threads
Run time 221468163
Run time 220505584
Run time 221465052
Run time 221587124
Run time 219965864
Run time 222025735
Run time 229103364
Run time 223751433
Run time 232994489
Run time 232980384
Run time 228709413
Run time 231892879
Run time 234950720
Run time 235469612
Run time 233690739
Run time 234489554
Run time 225850296
Run time 232385003
Run time 232641676
Run time 236887098
Run time 227695483
Run time 228017456
Run time 229519798
Run time 231757513
Run time 231118086
Run time 230468418
Run time 232784509
Run time 232030684
Run time 232966890
Run time 227513589
>> Average time = 229222886
>> Average time per call = 7

Tuesday, November 11, 2014

Java thread synchronization performance

I was surprised that I couldn't find anything useful on this topic from a Google search. There are a lot of questions and discussions about Thread synchronization, which is apt, since the topic is important and not terribly simple.

But there are some questions that get asked over and over but never answered, despite the answer being quite simple.

Here's the one that I couldn't find the answer to: how much overhead is there to entering a synchronization block? Just, flat out, with no other nonsense involved, how much does entering a synchronized block slow down your code?

To discuss it with a lot of people, you'd think that entering a synchronized block starts a chain of events that culminates in a petition sent to congress that could delay processing by weeks. I'll give you my results in a moment, but I suspect that misconception is the result of people watching a poorly designed multi-threaded program essentially become single-threaded because of the overuse of synchronization. That, however, is an entirely different topic.

EDIT: I'm continuing my research into synchronization, see the next installment here.

Let's construct a scenario to illustrate the question: you have a process that needs to run very fast based on the application requirements. Under normal conditions, there is only a single thread running this process, but occasionally (in unusual circumstances) there may be more than one. It's absolutely vital that these threads do not tramp all over each other, so some sort of synchronization is required. However, in the typical case, you don't want thread performance to be slowed by synchronization that isn't necessary. Can you just use synchronized blocks or will they create a performance bottleneck in the single-threaded case, requiring you to contrive some sort of synchronization that turns off and on?

So I wrote this simple test:

public class PerfTest {
  public static void main(String[] argc) {
    PerfTest pt = new PerfTest();
  private void run() {
    final int runs = 10000;
    for (int loop = 0; loop < 10; loop++) {
      long start, end;

      start = System.nanoTime();
      for (int i = 0; i < runs; i++) {
      end = System.nanoTime();
      System.out.println("Synchornized run time   " + (end - start));

      start = System.nanoTime();
      for (int i = 0; i < runs; i++) {
      end = System.nanoTime();
      System.out.println("Unsynchornized run time " + (end - start));
  private synchronized void withLock() {
  private void withoutLock() {
  private void common() {
    float f = 1;
    for (int i = 0; i < 10; i++) {
      f += f;


Synchornized run time   1533107
Unsynchornized run time 801967
Synchornized run time   346320
Unsynchornized run time 304972
Synchornized run time   340985
Unsynchornized run time 305335
Synchornized run time   342456
Unsynchornized run time 307547
Synchornized run time   343720
Unsynchornized run time 307947
Synchornized run time   343797
Unsynchornized run time 305243
Synchornized run time   344508
Unsynchornized run time 306182
Synchornized run time   340217
Unsynchornized run time 307223
Synchornized run time   340816
Unsynchornized run time 304765
Synchornized run time   343846
Unsynchornized run time 305347

Do the math and you come up with the answer that synchronized costs 3 nanoseconds (for those who aren't privy to that unit, that's .000000003 seconds)

Obviously, this is on a specific piece of hardware with a specific JVM installed. Other virtual machines on other hardware will likely produce variations in timing.

But the act alone of entering into a synchronized block probably isn't enough of a performance hit to shy away from it. Lesson being, if you think your code might ever be multi-threaded and operation could be hindered by unsynchronized access, add synchronized blocks. If your performance concerns are great enough that 3 nanoseconds is too long, then you probably shouldn't be writing Java in the first place.

There's obviously a lot more to thread synchronization. Hopefully I'll take some time to write some more in the near future.

Thursday, October 23, 2014

Idamu Caverns: Lessons Learned

I'm winding Idamu Caverns down for a bit.  Overall, I just can't justify the amount of time I've been spending on it, and if I don't get some other projects rolling, I'm going to end up with a money problem soon.  I'm not saying I'm abandoning development of the game, but there will certainly be a pause for the remainder of 2014.  Given the low level of interest, I doubt if anyone will notice, but touch base with me if you're interested in seeing development continue.

The more interesting part of the project (to me anyway) is lessons learned.  I think the easiest way to describe these various lessons is to examine the goals of the project, and discuss whether or not each goal was achieved.

Goal 1: Familiarize myself with the Android platform

I think IC was most successful in fulfilling this goal.  While I'm not a "Google Recognized Expert Developer", I find that I'm able to write Android code quickly and effectively now, without getting bogged down in problems like how to write an app that doesn't lock up, etc.  There were a lot of points during the early development of IC where I tried to use a desktop programming paradigm that just didn't work right, and I feel like I'll be able to avoid those mistakes with my next projects.

Goal 2: Familiarize myself with the Android market

I feel like this was also successfully achieved.  I know my way around the play store, I have a good feel for what is necessary to launch an app, and I feel like I can put the required pieces in place without a lot of flailing for future app launches.

With both goals 1 and 2, I feel like releasing IC as a free app was extremely helpful.  It allowed me to make mistakes (which I did, for sure) without having upset paying customers, requests for refunds, or anything else nasty come through the Play Store.  It set the expectation with the consumers and I feel like it also established goodwill -- I don't know of anyone who feels cheated by Idamu Caverns, to put it plainly.

Goal 3: Build a community around a work in progress

Utter failure, without a doubt.  My attempts at social media, as well as paid advertising, just haven't resulted in the kind of response that's needed to actually build a community.  It's tough to know why things never took off, but I have two guesses: Possibly the Rougelike genre just isn't popular enough to bring in a large community, or possibly there just aren't many people out there who want to be part of a work in progress.

A number of people commented on this when I first started, and I admit that this was one of the big conceits of the project: the idea that people would jump in and contribute.  Many people told me that it wouldn't work, in may different ways, everything from "people don't know what they want" to "you can't give people a blank slate and expect them to be interested."  I didn't feel like IC was ever a blank slate, but apparently, in some way, it was too blank for people to be inspired by.

In general, I didn't get anywhere near the level of interest I expected at all.  Even ignoring the game input, the number of people who downloaded the game was frighteningly small.  I did (what I felt was) a fair amount of paid marketing, as well as doing my rounds on the social networks.  I have to believe that it's the lack of allure of the game and not the marketing, otherwise it's difficult to believe that anyone would ever succeed at marketing any game at all.

Goal 4: Turn Idamu Caverns into a profitable venture

This goal was, without a doubt, failed.  It's failure is the primary reason I'm putting the project on the back burner.  The plan had always been "pay what you want" for the app (using Patreon).  It seems to me that this failing is a chain reaction of the failure to build a community, since I've only got about 100 persistent players (and only 200 people who've tried the game, total) I can't expect to make much money from that small of an audience.  In general, I'd planned at this point for the game to be making at least a few $100 per month.  My most profitable month to date earned me $4.

Android's NotificationManager ignoring errors

I found another one. This has been a rather frustrating week for me, working around quirks in Android, many of which are clearly bugs (like this one, or yesterday's post) and others where the jury is still out.

Today, it's a problem with NotificationManager. Specifically, NotificationManager.notify() silently swallows invalid notifications. In particular, if you create a notification without calling setSmallIcon(), the NotificationManager will simply not bother to add it to the notification panel. Like yesterday's problem, this is particularly difficult to debug because there are no errors or warnings anywhere.

What makes the whole thing more interesting, is that if your notification includes a sound or vibration, those actions will still happen, which leads you to believe that the weirdest thing ever is going on since nothing shows up after the noise. Of course, that's slightly less weird than having it do nothing at all, which leaves a developer with absolutely no idea how to debug or proceed.

I hope the Android team manages to find time to fix some of these issues.

Android SQLite and rawQuery()

It seems as if rawQuery() is unable to process data modification statements such as UPDATE or DELETE.

I don't know if this is by design of SQLite, or by design of the Android API into SQLite, or an accident of one or the other, but it would be nice if this were documented somewhere. It's possible that some part of some manual somewhere has a note on it, but when I brought myself up to speed on the tools I saw it nowhere; and over the course of two days worth of debugging to figure out why my code wasn't working, my searches didn't turn up any mention of it.

It appears that the only way to modify rows is to use the update() method. There's nothing wrong with that.

What is wrong is the fact that when given an UPDATE query, the rawQuery() method simply does nothing, without throwing an exception, logging anything, or in any way indicating that something has gone wrong.

This is the second piece of developer-hostile behavior I've come across since I started doing Android programming earlier this year. Here's hoping that Google will fix it, and anywhere else in the system where errors mysteriously disappear. If a method is passed illegal instructions, it should throw an exception. I can think of no justification for silently swallowing an invalid instruction.

Tuesday, October 7, 2014

Ahh ... the Internet culture

Less than a day after moving my new app, This is Why I'm Right, to production, and before starting any of the launch activities (such as marketing or an actual announcement) I've got a single 1 star review.

Aviv Gelman eloquently describes the app as "BS app, remove!!!"

This guy is very serious about his review. You can tell by the number of exclamation points. Also, by the fact that he feels that Google should remove the app. Oddly, using This is Why I'm Right, I find that BS apps have more fun per download than serious apps.

So, since I'm obviously right, I'm more interested in Aviv's experience.

First off, the term "BS app" is pretty ambiguous. What does that even mean? The app does exactly what it claims it does, so it's not BS in the deceptive sense. It doesn't steal your data or crash your phone or do sneaky things, so it's not BS in the criminal sense. Since he was complaining about the free version, he can't even argue that it wasn't worth the price he paid.

The only conclusion I can come to is that the app hurt Aviv's feelings. It was inevitable, really, that some people's feelings would be hurt. When you don't get a joke, often your feelings are hurt. And sometimes when a person has hurt feelings she/he lashes out violently, with things like 1 star ratings.

I'm sorry that I hurt your feelings, Aviv. Whatever you do, please don't download any animal simulators ...