Cheeky Solr Grabber
Published: August 3rd 2018
I had one of those moments where I needed to poll data out of Solr and realised that the nature of the query meant it was going to time out
unless I ran it in batches. Due to the fact
I'm rather averse to manual crap I'm busy at the moment with many spinning plates
I thought I'd write a quick and dirty piece of code to do things for me in the background.
Yet more Bash Scripts!
# CHEEKY SOLR BATCHENATOR
# Grabs info from a query, batches it up then concats to an output file
# Which Solr Server
echo -e "Solr Server IP?"
# Which Keyspace
echo -e "Keyspace?"
# Which Shard
echo -e "Which Shard?"
# Which Shard Replica
echo -e "Which Replica?"
# q Query
echo -e "What is the value of q?"
# fl Query
echo -e "What is the value of fl?"
# How many rows do you want to return
echo -e "How many items to loop over?"
# Where do you want to begin from - 0 or where you left off, e.g. 1000
echo -e "Beginning from what row?"
# How many rows do you want to return in each iterative run? ( Batch size )
echo -e "How many rows per cycle?"
# Calculate how many batches there are
batchNum = 1
# Undertake the query on the cluster
while [ $begin -le $max ]
echo -e "Batch $batchNum of $batchsize"
# Run the query against Solr
curl -sS "http://$solrServer:8080/solr/$keyspace_$shard_$replica/select?q=$qQuery&fl=$flQuery&start=$begin&rows=$rowsPerCycle&wt=csv&indent=true" >>solr-output-curl.csv
# Increment the counters
# Exit successfully
It's not mahoosively efficient in operation but did what was necessary for me to return the data I needed to. Of course, the query needs to be URL encoded and I pre-encoded the query in a Solr query interface before just shoving it directly in to the query section, but I've tried to break it out here in case it's of use.
Headless Tiny Head Pi - Part 1
Bash Scripts And Command Hooks