It's latency.
Unless you're game for upgrading stuff and going all pro, the best thing to do is after you've recorded in your situation is to simply zoom in to your vocal take(s) after the fact and chop off the extra space at the beginning and manually move your take left, and just check the waveform and listen to make sure it's lining up correctly.
You should do this without snap-to-grid, which should be a button you enable or disable that either positions your audio to the grid or not. Disable that feature, and chop to your first peaks during a verse or chorus where you know it should line up. Once you have it lined up, you should be good the whole way through.
__________________
|