Cheeky Solr Grabber
Published: August 3rd 2018
I had one of those moments where I needed to poll data out of Solr and realised that the nature of the query meant it was going to time out
unless I ran it in batches. Due to the fact
I'm rather averse to manual crap I'm busy at the moment with many spinning plates
I thought I'd write a quick and dirty piece of code to do things for me in the background.
Yet more Bash Scripts!
# CHEEKY SOLR BATCHENATOR
# Grabs info from a query, batches it up then concats to an output file
# Which Solr Server
echo -e "Solr Server IP?"
# Which Keyspace
echo -e "Keyspace?"
# Which Shard
echo -e "Which Shard?"
# Which Shard Replica
echo -e "Which Replica?"
# q Query
echo -e "What is the value of q?"
# fl Query
echo -e "What is the value of fl?"
# How many rows do you want to return
echo -e "How many items to loop over?"
# Where do you want to begin from - 0 or where you left off, e.g. 1000
echo -e "Beginning from what row?"
# How many rows do you want to return in each iterative run? ( Batch size )
echo -e "How many rows per cycle?"
# Calculate how many batches there are
batchNum = 1
# Undertake the query on the cluster
while [ $begin -le $max ]
echo -e "Batch $batchNum of $batchsize"
# Run the query against Solr
curl -sS "http://$solrServer:8080/solr/$keyspace_$shard_$replica/select?q=$qQuery&fl=$flQuery&start=$begin&rows=$rowsPerCycle&wt=csv&indent=true" >>solr-output-curl.csv
# Increment the counters
# Exit successfully
It's not mahoosively efficient in operation but did what was necessary for me to return the data I needed to. Of course, the query needs to be URL
encoded and I pre-encoded the query in a Solr query interface before just shoving it directly in to the query section, but I've tried to break it out
here in case it's of use.
Bash Scripts And Command Hooks