Database, Array, Hash - Learning Resources - Crystal Forum
Database, Array, Hash - Learning Resources - Crystal Forum
Hello,
yesterday I used Crystal for the first time. I wanted to see how it compares to Ruby in a simple but important task. It’s about processing and querying data. As a temporary step, I wanted to read hashes from the database, fix the format (real numbers stored as text), order them by the values, write it back. I also wanted to write back an ordered Array of the keys as the JSON keys aren’t ordered.
The JSON strings (Hashes) have tens of thousand key-value pairs.
Ruby version:
require 'pg'
require 'json'
base = PG.connect('host=localhost dbname=xxx')
rows = base.query('select * from temp order by user_id limit ')
puts rows.class
start = Time.now
rows.each do |row|
results_hash = JSON.parse row['percentages']
results = results_hash.map { |key, value| [key.to_i, value.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
$ time ./reorder.rb
PG::Result
{=>90.36,
{=>50.71,
...
{=>86.16,
{=>84.63,
{=>85.71,
57.
real 1m1.122s
Ruby version with converting the rows into an Array from the PGResult. Only one line differs:
...
rows = base.query('select * from temp order by user_id limit ').to_a
...
time ./a_reorder.rb
Array
{=>90.36,
{=>50.71,
...
{=>86.16,
{=>84.63,
{=>85.71,
56.
real 1m0.557s
Crystal version, using PG::ResultSet:
require "db"
require "pg"
module Crystal
DB.open "postgresql://localhost/xxx" do |db|
db.query("select * from temp order by user_id limit ") do |rows|
puts rows.class
start = Time.now
rows.each do
user_id, percentages = rows.read Int32, JSON::Any
results = percentages.as_h.map { |key, value| [key.to_i, value.to_s.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
end
end
end
$ time ./crystal
PG::ResultSet
{ => 90.36, 742
{ => 50.71, 249
...
{ => 86.16, 532
{ => 84.63, 928
{ => 85.71, 264
00:00:29.
real 0m29.469s
Crystal version, using Array:
require "db"
require "pg"
module Crystal
DB.open "postgresql://localhost/xxx" do |db|
rows = db.query_all "select * from temp order by user_id limit ", as: {Int32, JSON::Any}
puts rows.class
start = Time.now
rows.each do |row|
results = row[1].as_h.map { |key, value| [key.to_i, value.to_s.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
end
end
$ time ./rows
Array(Tuple(Int32, JSON::Any))
{ => 90.36, 742
{ => 50.71, 249
...
{ => 86.16, 532
{ => 84.63, 928
{ => 85.71, 264
00:00:21.
real 0m53.327s
Note that I wrote the Ruby version with sort { |x, y| y[1] <=> x[1] } instead of sort_by {}.reverse but the <=> operator didn’t work in Crystal, and I spent already enough time to make the database connection work. Hence I used sort_by {}.reverse in both languages to be fair.
There were details that surprised me. Other details didn’t.
I knew that Ruby had an excellent PostgreSQL driver. Years ago, when Rust was about v1.0, I compared Ruby vs. Rust vs. Go in raw query performance. Ruby was the fastest due to the cached prepared query. Go was the slowest.
It surprised me that converting the rows to an Array in Ruby took almost no time, and it ran for the same time as the PG::Result.each version.
It surprised me even more how slow was the the query-the-rows-as-Array version in Crystal.
You can see that both Ruby versions spent about 4 seconds on querying, fetching, and processing the result from the database. The rest of the time was spent on converting the strings to integers and floats, ordering it, and making it a Hash again.
Crystal was different. The JSON to Hash, convert strings to integers and floats, ordering, and making it a Hash again part took only 22 seconds. However, the query_all was so slow that in the end, that Crystal version was near the same speed as Ruby.
If I consider that the conversion was about 21 seconds, then the query-fetch part using db.query took only 7-8 seconds.
Conclusion:
In this test, the querying and fetching in Ruby was twice as fast (4s vs. 8s) compared to Crystal’s db.query and rows.each. Even a bit faster as the 4s includes the Ruby starting overhead. However, it’s normal. Ruby also has beaten Rust and Go in this area (many years ago when I tested it). The developer of the Rust pg driver wrote to me that the Ruby pg was excellent C code.
I wonder why the db.query_all and rows.each do |row| version in Crystal was about ten times slower than Ruby, as well as about five times slower than the other Crystal version.
As for the typecasts. I checked the Crystal driver’s JSON part. It also calls JSON.parse on the field but in Crystal, there will be an additional typecast, I believe. It’s the JSON::Any to Hash in my code. (The Ruby JSON.parse returns a Hash). Still, these casts were 2.7 times faster (if I counted right) in Crystal. JSON::Any didn’t support #map, unfortunately. In the end, I Crystal was twice as fast as Ruby, if I used the db.query version.
EBO contains other products and information you need, so please check it out.
Additional reading:SIPLACE/SIEMENS Nozzle Magazine 7 9XX
I’m not comparing the performance of the two languages. It was my first Crystal code, only a tiny selection of the features. Besides, most of these Ruby methods are in C.
However, it wasn’t a synthetic test either. I was comparing solutions. I have to process this data for real. The original table has a few billion records. Crystal will help.
Cheers.
Crystal Reports Grouping by Array with Subreports
Good day,
I’ll do my best to explain things, but I can also provide more details based on the questions asked in return. I appreciate your wisdom.
We have an existing class attendance report that currently runs for a single day, however, we want to adjust the report to print for a week at a time, but have the printouts maintain their daily format. The report gives a listing of students grouped by Preschool Site and then Classroom, and each Classroom group has multiple sub-reports (Early Drop-Off Students, Regular Day Students, Extended Day Students, etc.), with the result being a page (or two) per classroom per date. The print job would need to be in the order:
Class A Monday
Class A Tuesday
Class A Wednesday
Class A Thursday
Class A Friday
Class B Monday
Class B Tuesday
Etc.
I expect that to do this we would need a new Date group, so that the groups would be: Site -> Class Room -> Date.
We have been looking into Arrays, in an attempt to run the report for each day of the week in a single run.
My main question at this time: can anyone confirm that Crystal will allow an array that supplies information to a group (not the top group), and also allow that array to feed into multiple sub-reports? The array would always have 5 dates.
Thank you,
Matt
For more Crystal Arrayinformation, please contact us. We will provide professional answers.
6
0
0
Comments
All Comments (0)