I needed to sort some SQLite table result with a natural sort algorithm for a small project. For strings, Natural Sort Order is more human friendly than traditional Alphabetical Order. There is a comparison below showing the order difference of Alphabetical Order and Natural Sort Order.
|Alphabetical Order||Natural Sort Order|
In my case, Natural Sort Order is an expected order so I implemented it in SQLite using my favorite language Python.
First of all, we need a natural sort algorithm. After quick internet search,
natsort python package seems to be the best solution with its BSD License. You can install it with
pip install natsort
SQLite recommends to create new collations for custom sorting. This new collation can be created like this after creating database connection.
sqlite3_connection_instance.create_collation("collation name", callable)
I created a code snippet to show it. Please check the
natsort documentation for further reading. Final Order of inserted items is satisfactory for me.
#!/usr/bin/python # encoding: utf-8 from __future__ import print_function from natsort import natsorted, ns import sqlite3 # this is our comparison function using natural sort def nsort(str1, str2): if str1 == str2: return 0 natorder = natsorted([str1, str2], alg=ns.GROUPLETTERS) idx = natorder.index(str1) if idx == 0: return -1 else: return 1 con = sqlite3.connect(":memory:") # register our natural sort function as collation con.create_collation("naturalsort", nsort) cur = con.cursor() cur.execute("CREATE TABLE main (name)") con.commit() cur.execute("INSERT INTO main (name) VALUES ('A-0')") cur.execute("INSERT INTO main (name) VALUES ('A-01')") cur.execute("INSERT INTO main (name) VALUES ('a-010')") cur.execute("INSERT INTO main (name) VALUES (NULL)") cur.execute("INSERT INTO main (name) VALUES ('A-2')") cur.execute("INSERT INTO main (name) VALUES ('A-02')") con.commit() cur.execute("SELECT * FROM main ORDER BY name COLLATE naturalsort ASC") print(cur.fetchall()) # Result is: # [(None,), (u'A-0',), (u'A-01',), (u'A-2',), (u'A-02',), (u'a-010',)]
My second method is to use an alphanumeric comparison algorithm for natural sorting. Dave Koelle has coded this for various programming languages. Implementation in SQLite is simple and python code for The Alphanum Algorithm is as below.
... con.create_collation("naturalsort", alphanum) ...
# # The Alphanum Algorithm is an improved sorting algorithm for strings # containing numbers. Instead of sorting numbers in ASCII order like # a standard sort, this algorithm sorts numbers in numeric order. # # The Alphanum Algorithm is discussed at http://www.DaveKoelle.com # #* Python implementation provided by Chris Hulan ([email protected]) #* Distributed under same license as original # # This library is free software; you can redistribute it and/or # modify it under the terms of the GNU Lesser General Public # License as published by the Free Software Foundation; either # version 2.1 of the License, or any later version. # # This library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Lesser General Public License for more details. # # You should have received a copy of the GNU Lesser General Public # License along with this library; if not, write to the Free Software # Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA # import re # # TODO: Make decimal points be considered in the same class as digits # def chunkify(str): """return a list of numbers and non-numeric substrings of +str+ the numeric substrings are converted to integer, non-numeric are left as is """ chunks = re.findall("(\d+|\D+)",str) chunks = [re.match('\d',x) and int(x) or x for x in chunks] #convert numeric strings to numbers return chunks def alphanum(a,b): """breaks +a+ and +b+ into pieces and returns left-to-right comparison of the pieces +a+ and +b+ are expected to be strings (for example file names) with numbers and non-numeric characters Split the values into list of numbers and non numeric sub-strings and so comparison of numbers gives Numeric sorting, comparison of non-numeric gives Lexicographic order """ # split strings into chunks aChunks = chunkify(a) bChunks = chunkify(b) return cmp(aChunks,bChunks) #built in comparison works once data is prepared if __name__ == "__main__": unsorted = ["1000X Radonius Maximus","10X Radonius","200X Radonius","20X Radonius","20X Radonius Prime","30X Radonius","40X Radonius","Allegia 50 Clasteron","Allegia 500 Clasteron","Allegia 51 Clasteron","Allegia 51B Clasteron","Allegia 52 Clasteron","Allegia 60 Clasteron","Alpha 100","Alpha 2","Alpha 200","Alpha 2A","Alpha 2A-8000","Alpha 2A-900","Callisto Morphamax","Callisto Morphamax 500","Callisto Morphamax 5000","Callisto Morphamax 600","Callisto Morphamax 700","Callisto Morphamax 7000","Callisto Morphamax 7000 SE","Callisto Morphamax 7000 SE2","QRS-60 Intrinsia Machine","QRS-60F Intrinsia Machine","QRS-62 Intrinsia Machine","QRS-62F Intrinsia Machine","Xiph Xlater 10000","Xiph Xlater 2000","Xiph Xlater 300","Xiph Xlater 40","Xiph Xlater 5","Xiph Xlater 50","Xiph Xlater 500","Xiph Xlater 5000","Xiph Xlater 58"] sorted = unsorted[:] sorted.sort(alphanum) print '+++++Sorted...++++' print '\n'.join(sorted)