I had one of those moments where I needed to poll data out of Solr and realised that the nature of the query meant it was going to time out unless I ran it in batches. Due to the fact I'm rather averse to manual crap I'm busy at the moment with many spinning plates I thought I'd write a quick and dirty piece of code to do things for me in the background.

#  Grabs info from a query, batches it up then concats to an output file

# Which Solr Server
echo -e "Solr Server IP?"
read solrServer

# Which Keyspace
echo -e "Keyspace?"
read keyspace

# Which Shard
echo -e "Which Shard?"
read shard

# Which Shard Replica
echo -e "Which Replica?"
read replica

# q Query
echo -e "What is the value of q?"
read qQuery
# fl Query
echo -e "What is the value of fl?"
read replica

# How many rows do you want to return
echo -e "How many items to loop over?"
read max
# Where do you want to begin from - 0 or where you left off, e.g. 1000
echo -e "Beginning from what row?"
read begin
# How many rows do you want to return in each iterative run? ( Batch size )
echo -e "How many rows per cycle?"
read rowsPerCycle
# Calculate how many batches there are
let "batchsize=max/rowsPerCycle"
batchNum = 1
# Undertake the query on the cluster
while [ $begin -le $max ]
    echo -e "Batch $batchNum of $batchsize"
  # Run the query against Solr
  curl -sS "http://$solrServer:8080/solr/$keyspace_$shard_$replica/select?q=$qQuery&fl=$flQuery&start=$begin&rows=$rowsPerCycle&wt=csv&indent=true" >>solr-output-curl.csv
  # Increment the counters
  let "begin=begin+rowsPerCycle"
  let "begin=begin+1"
  let "batchNum++"
# Exit successfully
exit 0

It's not mahoosively efficient in operation but did what was necessary for me to return the data I needed to. Of course, the query needs to be URL encoded and I pre-encoded the query in a Solr query interface before just shoving it directly in to the query section, but I've tried to break it out here in case it's of use.

Happy hacking!

